blank

Workspace Optimization Techniques to Improve Prediction of Human Motion During Human-Robot Collaboration

2024-01-12T00:00:00+00:00

Figure 1.

Suppose that you are picking up the blue square cube shown in Figure 1 (left). The natural path (solid) makes it hard for the robot to predict whether you are picking up the blue square cube or the red triangle cube while the legible path (dotted) requires you to take a circuitous route. To improve a robot's prediction of a human teammate's goals during a collaborative task shown in Figure 1 (right), the robot can configure the workspace by rearranging objects and projecting "virtual obstacles" in augmented reality (cyan and red barriers), in order to induce naturally legible paths from the human.

In human-robot collaboration, the robot needs to predict human motion in order to coordinate its actions with those of the human. Current algorithms rely on the human motion model to achieve safe interactions, but human motion is inherently highly variable as humans can always move unexpectedly. Our work takes a different approach and addresses a fundamental challenge faced by all human motion prediction models; we reduce the uncertainty inherent in modeling the intentions of human collaborators by pushing them towards legible behavior via environment design. Our work improves human motion model predictions by increasing environmental structure to reduce uncertainties, facilitating more fluent human-robot interactions.

Methods

Figure 2.

Quality Diversity Search

We use a quality diversity (QD) algorithm called MAP-Elites to search through the space of environment configurations (i.e. object positions and virtual obstacle placements). MAP-Elites keeps track of a behavior performance map (also known as the solution map) that stores the best performing solution found for each combination of features chosen by the designer. For example, in the Overcooked game, we use two features 1) Number of Obstacles and 2) Ordering of Ingredient Placements. The environment shown in the middle of Figure 2 has 3 obstacles and ingredient ordering onions-fish-dish-tomatoes-cabbage. The solution map is a matrix where the first dimension is the number of obstacles and the second dimension includes the possible ingredient orderings. Note that the solution map can have more than 2 dimensions.

MAP-Elites consists of two phases: 1) initialization phase where environments are randomly generated and placed into the solution map according to their features and 2) improvement phase where environments are randomly sampled from the map and mutated. While MAP-Elites generates diverse solutions by altering existing ones through random mutations, the process may require substantial computational time to yield high quality solutions. Differentiable QD is a method that performs gradient descent on the objective function and the features to speed up MAP-Elites but requires both the objective and feature functions to be differentiable. We empirically approximate the gradient of the objective function through stochastic sampling, which may lead to “suboptimal” solutions that can, however, contribute to increased diversity. In our implementations, we continue this stochastic hill climbing until a local maxima is found. In Overcooked, we sample new locations for ingredients and additions/removals of virtual obstacles as possible mutations. The solution map is updated if the new environment generated from the mutation step is better than the existing solution in the solution map with the same features.

Figure 3.

The tabletop task in Figure 1 requires the human and the robot to collaboratively place cubes into a desired configuration shown in Figure 3 (left). To mutate an existing environment, we sample new locations for the cubes based on a Gaussian with variance = 7cm. We also sample the locations and orientations of fixed-size virtual obstacles. An example of the stochastic hill climbing in the improvement phase of MAP-Elites is shown in Figure 3 (right).

Legibility Objective Function

The objective function considers all the possible goals the human might be reaching for at a given stage of task execution and maximizes the probability of correctly predicting the human’s chosen goal. The probability of the human’s goal is given by the equation below:

\[\Pr(G | \mathcal{\xi}_{S \rightarrow Q}) \propto \frac{exp(-C(\mathcal{\xi}_{S \rightarrow Q}) - C(\mathcal{\xi}^*_{Q \rightarrow G}))}{exp(-C(\mathcal{\xi}^*_{S \rightarrow G}))}\]

The optimal human trajectory from point $X$ to point $Y$ with respect to cost function $C$ is denoted by $\mathcal{\xi}^*_{X \rightarrow Y}$. This equation evaluates how cost efficient (with respect to $C$) going to goal $G$ is from start state $S$ given the observed partial trajectory $\mathcal{\xi}_{S \rightarrow Q}$ relative to the most efficient trajectory $\mathcal{\xi}^*_{S \rightarrow G}$. For a given ground truth goal $G_{true}$, if the predicted goal is not $G_{true}$, we penalize by a constant $c$ multiplied by the length of the observed trajectory $\vert \mathcal{\xi}_{S \rightarrow Q} \vert$. If the predicted goal is correct, we encourage more confident predictions by maximizing the difference between the probability of the correct goal and the second highest goal probability. This is summarized in the equation below.

\[\text{EnvLegibility}(G_{true}) = \begin{cases} -c |\mathcal{\xi}_{S \rightarrow Q}|, \text{ if } \underset{G \in \mathcal{G}}{\arg\max} \Pr(G | \mathcal{\xi}_{S \rightarrow Q}) \neq G_{true} \\ margin(\mathcal{G}|\mathcal{\xi}_{S \rightarrow Q}) = G_{(n)} - G_{(n-1)}, \text{ otherwise} \end{cases}\]

We compute EnvLegibility for each possible ground truth goal at each stage of the task execution. The function $permutations(T)$ are all the different ways a task $T$ can be performed. $\mathcal{G}$ is the set of valid goals the human can reach for when performing subtask $t$ with task ordering $T'$.

\[\text{objective function} = \sum_{T' \in \text{permutations}(T)} \mathbb{1}\{\text{valid}(T')\} \times \sum_{t \in T'} \sum_{G \in \mathcal{G}} \text{EnvLegibility}(G)\]

Experiments and Results

Figure 4. Top down view of the workspace plotting the mean and covariance of the time series multivariate Gaussian for each condition. From left to right, the conditions are Baseline, Placement Optimized, Virtual Obstacle Optimized, and our approach Both Optimized. The model in the Both Optimized condition has less covariance compared to the models trained in the other environment configurations.

Discussion

In this work, we introduce an algorithmic approach for autonomous workspace optimization to improve robot predictions of a human collaborator’s goals. We envision that our framework can improve human robot teaming, by improving goal prediction and situational awareness, for domains such as shared autonomy for assistive manipulation, warehouse stocking, cooking assistance, among others. Our approach is applicable for domains where the following conditions hold: 1) Multiple agents share the same physical space and the agents do not have access to other agents’ controllers or decision making processes (otherwise a centralized controller can be used), 2) the environment allows physical or virtual configurations, and 3) environment configuration can be performed prior to the interaction. Through dual experiments in 2D navigation (see paper) and tabletop manipulation, we show that our approach results in more accurate model predictions across two distinct goal inference methods, requiring less data to achieve these correct predictions. Importantly, we demonstrate that environmental adaptations can be discovered and leveraged to compensate for shortfalls of prediction models in otherwise unstructured settings.

Minimizing Entropy for Prediction Problems

2023-12-15T00:00:00+00:00

When we want to be more confident about our predictions for a classification problem, we often use an objective that minimizes the entropy. But why is this not enough? This post will discuss entropy and cross entropy losses.

What is Entropy?

Entropy (in information theory) is a measure of uncertainty; the higher the entropy, the more uncertain you are. Entropy is defined as

\[H(X) = - \sum_{x \in \mathcal{X}} p(x)\text{log} p(x)\]

where $X$ is the discrete random variable that takes values in the alphabet $\mathcal{X}$ and is distributed according to $p: \mathcal{X} \rightarrow [0, 1]$. $-\text{log}p(x)$ is the information of an event $x$. So entropy $H$ is the sum of the information for each possible event $x \in \mathcal{X}$ weighted by the probability of the event $p(x)$. Rare events (low probability) give more information and have higher values. Another way to think of it is that the entropy of a probability distribution is the optimal number of bits (when using log base 2) required to encode the distribution. When $p(x)$ is high, we use fewer bits to represent the event $x$ because we see it more often and it is cheaper to use fewer bits. When $p(x)$ is low, we use more bits. This is given by the information of the event $-\text{log}p(x)$.

Classification problems

In the context of human goal prediction, we want to train a model that outputs the correct human goal $x$ given that the model observed some initial human trajectory $\xi_{S \rightarrow Q}$ that started at point $S$ and ended at point $Q$. A human goal can be an object they are reaching towards or some task that they are performing. Suppose the human can reach towards the apple, banana, or grapes ($\mathcal{X} = \{\text{apple}, \text{banana}, \text{grapes}\}$), and we have a model $f$ that outputs a distribution over the likelihood of goals (via neural network with softmax output or a Bayesian classifier). We can get the predicted goal by taking the argmax of the distribution $\hat{x} = \text{argmax}_{x} f(\xi)$.

Grape icons created by Dreamcreateicons - Flaticon

To train our model to be more certain about its predictions, we can minimize the entropy of the output distribution during training. We can use the following loss function: Given a predicted label $\hat{x}$ and the true label $x$, $\mathcal{L}(x, \hat{x}) = \mathbb{1}\{x = \hat{x}\}H(f(x))+\mathbb{1}\{x != \hat{x}\}c$ for some constant $c$. This equation penalizes the prediction by $c$ if the prediction is incorrect and by the entropy if it is correct. $\mathbb{1}\{q\}$ is the indicator function and evaluates to 1 if $q$ is true otherwise 0. However, for predictions that are correct, minimizing the entropy may not give you more confident correct predictions. Suppose that the model has the following two predictions for the figure above where the human is reaching for the apple: 1) [0.55, 0.25, 0.2] and 2) [0.45, 0.44, 0.11]. The array corresponds to the goal distribution for apple, banana, and grapes respectively. Intuitively, we prefer the first array because the model is more confident ($55 \%$) about the prediction. However, the entropy for 1) is 0.998 and for 2) is 0.963. Minimizing the entropy will move the model outputs closer to 2) [0.45, 0.44, 0.11].

Two possible goal probability distributions. The model is more confident about its prediction on the left, but the entropy is smaller for the distribution on the right. If our objective is to minimize the entropy for correct predictions, we could be pushing the model's output closer to the right distribution.

Connection to Cross Entropy

A common loss function for classification problems is the cross entropy loss. The cross entropy of distribution $q$ relative to another distribution $p$ is defined as

\[H(p, q) = - \sum_{x \in \mathcal{X}} p(x)\text{log}q(x)\]

Intuitively, it measures the average number of bits needed to encode the actual distribution $p$ when using the distribution $q$. We can rewrite $H(p, q)$ as

\[\begin{align} H(p, q) &= - \sum_{x \in \mathcal{X}} p(x)\text{log}q(x)\\ &= -\sum_{x \in \mathcal{X}} p(x) (\frac{\text{log}q(x)}{\text{log}p(x)} \text{log}p(x))\\ &= -\sum_{x \in \mathcal{X}} p(x) \frac{\text{log}q(x)}{\text{log}p(x)} - \sum_{x \in \mathcal{X}} p(x) \text{log}p(x)\\ &= D_{KL}(p||q) + H(p) \end{align}\]

The first term is the Kullback-Leibler (KL) divergence which measures how different the distributions $p$ and $q$ are, and the second term is the entropy of $p$. In addition to minimizing entropy, the cross entropy loss minimizes the distance between the predicted and actual distributions which resolves the issue above.

Improving Human Legibility in Collaborative Robot Tasks through Augmented Reality and Workspace Preparation

2023-04-01T00:00:00+00:00

Motivation

The canonical approach for human robot interaction is to first predict the human motion and then generate a robot plan. This requires a robust and accurate predictions of human behavior. If the human models are inaccurate, the robot may produce unsafe interactions.

To achieve more fluent human robot interaction, there has also been work on improving the robot’s expressiveness of its intentions. Dragan et al. developed a formalism for legibility which is the probability of successfully predicting an agent’s goal given an observation of a snippet of its trajectory ¹. Subsequent works have shown that robots with legible movements lead to greater task efficiency, trustworthiness, and sense of safety in human robot collaboration. However, there hasn’t been work on looking at how human legibility can improve the robot’s ability to collaborate with humans.

Our Approach

We improve the robot’s prediction of the human’s goal (aka the human’s legibility) by projecting virtual barriers in augmented reality and rearranging objects in the workspace. We present a metric that evaluates an environment configuration in terms of its legibility and present an optimization approach for autonomously generating object and virtual barrier placements.

Method

Legibility Metric

First we define the legibility metric that scores an environment configuration in terms of how legibile it is to approach a goal.

Given a ground truth goal $G_{true}$ (blue circle in the example below), we generate a snippet of the human’s trajectory approaching the blue circle. In our experiments we use visibility graphs. If the most likely goal is not the blue circle, we penalize by a fixed cost $c$.

If the robot is able to correctly predict the ground truth goal, we still want to award configurations with more confident predictions. Therefore, the score is the margin function which is the difference between the most likely and second most likely goal probabilities.

Legibility Optimization

Now that we can evaluate the legibility of an environment configuration given a ground truth goal, we present an objective function that evaluates the legibility for a task.

We assume the task $T$ has general precedence constraints. A subtask can only begin after all of its precedence constraints are completed. For example, In an assembly task, you need to finish making the drawers before inserting them into their final positions. In the objective function, we iterate over all possible valid orderings of task $T$, which are the orderings that satisfy precedence constraints. We find the possible ground truth goals at a given state of a task and sum up the legibility scores (EnvLegibility).

Quality Diversity for Efficient Search

Now that we know how to evaluate the legibility of a given workspace configuration, we need to search through the space of possible configurations to find the most legible one. This is computationally intractable for most non-trivial applications. We use a quality diversity approach called MAP-Elites ² to efficiently search through the space of possible workspace configurations.

The MAP-Elites algorithm consists of two phases, the initialization phase and the improvement phase. In the initialization phase, a random environment is generated and placed in a grid where the dimensions of the grid are some features or behaviors of interest. Here we use the ordering of the cubes in the x-axis and the min distance between cubes. If two environments have the same cell in the grid, the one that has a higher legibility score is stored.

MAP-Elites Initialization

In the improvement phase, we sample a cell from the grid and run gradient descent. In gradient descent, we first sample new positions for each cube from a Gaussian centered at the cube’s current position. The new configuration may end up in a different cell in the grid. If the cell is not populated or if the new configuration has a better legibility score, then the new configuration is stored in the cell. Then, two cubes are randomly selected and a virtual obstacle of a fixed size is placed between them. We store the new configuration if the legibility score is better.

MAP-Elites Improvement Phase

Below is a gif of the gradient descent process.

Demo

A. D. Dragan, K. C. T. Lee and S. S. Srinivasa, “Legibility and predictability of robot motion,” 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI), Tokyo, Japan, 2013, pp. 301-308, doi: 10.1109/HRI.2013.6483603. ↩
Mouret, Jean-Baptiste, and Jeff Clune. “Illuminating search spaces by mapping elites.” arXiv preprint arXiv:1504.04909 (2015). ↩