Boston Dynamics and Google DeepMind Teach Spot to Reason
Boston Dynamics and Google DeepMind Teach Spot to Reason
The latest advancements in robotics and artificial intelligence (AI) have brought us closer to creating robots that can understand and interact with the physical world in a more human-like way. Boston Dynamics, a leading robotics company, has partnered with Google DeepMind to equip its quadruped robot, Spot, with a high-level embodied reasoning model called Gemini Robotics-ER 1.6. This model enables Spot to autonomously perform complex tasks, such as inspection, and provides a significant step towards creating robots that can better understand and operate in the physical world.
Understanding Robot Understanding
The terms "reasoning" and "understanding" are being increasingly applied to AI and robotics, but what do they actually mean for robots in practice? According to Carolina Parada, head of robotics at Google DeepMind, the benchmark for understanding is that the system should answer in a way that a human would. This connection between how robots understand the world and how humans do is critical for reliable and safe task performance.
The Disconnect Between Instructions and Task Execution
Boston Dynamics' video showcasing Spot's ability to recycle cans in a living room highlights the disconnect between how humans and robots understand tasks. Spot has no problem completing the task, but it grips the can sideways, which is not going to end well for cans with leftover liquid. Humans would avoid this because they can draw on a lifetime of experience to know how cans should be held, but robots don't (yet) have that kind of world knowledge.
Safety Perspective: Approaching Situations from a Safety Perspective
Parada explains that Gemini Robotics-ER 1.6 approaches situations like this from a safety perspective. If you ask the robot to bring you a cup of water, it will reason not to place it on the edge of a table where it could fall. The current version of Spot doesn't use these semantic safety models for manipulation, but the plan is to make future versions reason about holding objects in ways that are safe.
Success Detection: Combining Multiple Camera Angles
One of the new features of 1.6 is success detection, which combines multiple camera angles to more reliably be able to tell when Spot has successfully grasped an object. This is great if you're relying entirely on vision for your object interaction, but robots have all kinds of other well-established ways to detect a successful grasp, including touch sensors and force sensors, that 1.6 is not using.
The Fundamental Problem of Training Models with Physical Data
The reason why this is the case speaks to a fundamental problem that the robotics field is still trying to figure out: how to train models when you need physical data. At the moment, these models are strictly vision only. There is lots of visual information on the web about how to pick up a pen, but there is not a lot of data with touch sensing on the internet.
Real-World Robots That Are Useful
The fact that Boston Dynamics has customers makes them something of an anomaly when it comes to legged robots that rely on AI in commercial deployments. And those customers will have to be able to trust the robot—always a problem when AI is involved. "We take this very seriously," da Silva said in an interview. "We roll out new DeepMind capabilities through beta programs to a smaller set of customers to understand what to anticipate, and we only actively advertise features we are confident will work."
The Threshold of Usefulness
There's a threshold of usefulness that robots like Spot need to reach, and fortunately, the real world doesn't demand perfection. "Most critical infrastructure in a facility will be instrumented to tell you whether something is wrong," da Silva says. "But there is a lot of stuff that is not instrumented that can still cause a problem if you aren't paying attention to it. We've found that somewhere north of 80 percent is the threshold where it's not annoying. Below that, basically the robot is crying wolf, and the operators will start ignoring it."
Forward-Looking Thoughts
Both da Silva and Parada agree that there's still plenty of room for improvement in robotic inspection. As Parada points out, Spot's rarefied status as a scalable commercial platform provides a valuable opportunity to learn how models like Gemini Robotics-ER 1.6 can be the most useful, and then apply that knowledge to other embodied AI platforms, including Boston Dynamics' Atlas. Does that mean that Atlas is going to be the next industrial inspection robot? Probably not. But if this real-world experience can get us closer to safe and reliable robots that can pick up laundry, take a dog for a walk, and clear away soda cans without making a mess, that's something we can all get excited about.
Source: https://spectrum.ieee.org/boston-dynamics-spot-google-deepmind




