Large Behavior Models Are Helping Atlas Get to Work

Boston Dynamics can be forgiven for the relative lack of acrobatic prowess displayed by the new version of Atlas in (most of) its latest videos. In fact, if you look at this Atlas video from late last year and compare it to Atlas's most recent video, it's doing what looks to be more or less the same logistics-y stuff—all of which is far less visually exciting than backflips. But I would argue that the relatively dull tasks Atlas is working on now, moving car parts and totes and whatnot, are just as impressive. Making a humanoid that can consistently and economically and safely do useful things over the long term could very well be the hardest problem in robotics right now, and Boston Dynamics is taking it seriously.

Building AI Generalist Robots

While the context of this work is "building AI generalist robots," I'm not sure that anyone really knows what a "generalist robot" would actually look like, or how we'll even know when someone has achieved it. Humans are generalists, sort of—we can potentially do a lot of things, and we're fairly adaptable and flexible in many situations, but we still require training for most tasks. I bring this up just to try and contextualize expectations, because I think a successful humanoid robot doesn't have to actually be a generalist, but instead just has to be capable of doing several different kinds of tasks, and to be adaptable and flexible in the context of those tasks. And that's already difficult enough.

The Approach of Leveraging Large Behavior Models

The approach that the two companies are taking is to leverage large behavior models (LBMs), which combine more general world knowledge with specific task knowledge to help Atlas with that adaptability and flexibility thing. As Boston Dynamics points out in a recent blog post, "the field is steadily accumulating evidence that policies trained on a large corpus of diverse task data can generalize and recover better than specialist policies that are trained to solve one or a small number of tasks." Essentially, the goal is to develop a foundational policy that covers things like movement and manipulation, and then add more specific training (provided by humans) on top of that for specific tasks.

Imitation Learning and Large Behavior Models

Boston Dynamics is using imitation learning, an operator wearing a motion tracking system teleoperates Atlas through motion and manipulation tasks. There's a one-to-one mapping between the operator and the robot, making it fairly intuitive, although as anyone who has tried to teleoperate a robot with a surfeit of degrees of freedom can attest, it takes some practice to do it well. A motion-tracking system provides high-quality task training data for Atlas.

The Role of Large Behavior Models in Imitation Learning

It's primarily a question of scale. A large behavior model is essentially imitation learning at scale, similar to a large language model. The hypothesis with large behavior models is that as they scale, generalization capabilities improve, allowing them to handle more real-world corner cases and require less training data for new tasks. Currently, the generalization of these models is limited, but we're addressing that by gathering more data not only through teleoperating robots but also by exploring other scaling bets like non-teleop human demonstrations and sim/synthetic data.

The Future of Atlas

We're really focused on maximizing the performance manipulation behaviors. I think one of the things that we're uniquely positioned to do well is reaching the full behavioral envelope of humanoids, including mobile bimanual manipulation, repetitive tasks, and strength, and getting the robot to move smoothly and dynamically using these models. We're also developing repeatable processes to climb the robustness curve for these policies—we think reinforcement learning may play a key role in achieving this.

The Importance of High-Quality Data

Ideally, you want the top of the pyramid to be as big as possible, right? Ideally, yes. But you won't get to the scale you need by just doing that. You need the whole pyramid, but having as much high-quality data at the top as possible only helps. It's not like you can just have a super-large bottom to the pyramid and not need the top. I don't think so. I believe there needs to be enough high-quality data for these models to effectively translate into the specific embodiment that they are executing on.

Conclusion

Boston Dynamics is taking a unique approach to building a humanoid robot that can do useful things over the long term. By leveraging large behavior models and imitation learning, they're making progress towards creating a robot that can adapt and be flexible in the context of different tasks. While there's still much work to be done, the potential implications of this research are significant, and it's an area worth watching in the coming years.

Source: https://spectrum.ieee.org/boston-dynamics-atlas-scott-kuindersma