Physical Intelligence raises $600M to advance robot foundation models

Advancing Robot Foundation Models: Physical Intelligence Raises $600M

In a significant development for the field of robotics, Physical Intelligence, a San Francisco-based company, has raised $600 million in Series B funding. This investment will enable the company to further develop its foundation models for robots, allowing them to understand and interact with the physical world.

Faster, More Reliable Robots

Physical Intelligence's foundation models are designed to make it easier for robots to learn from a variety of inputs and generalize behaviors more quickly with smaller amounts of data than previous reinforcement learning (RL) approaches. This has significant implications for robot performance in unstructured environments, such as retail stores and households.

The company's approach involves streaming RGB-D camera images from any robot to its runtime, which tokenizes the visual stream along with the robot's movement history and feeds it to a 3 billion to 5 billion-parameter transformer model. Users can provide plain-language goals, such as "make a flat white" or "pack chocolates into this box." The model takes about 100 ms to predict the next 50 steps, and a hardware abstraction layer converts the tokens into robot-specific joint commands, within force and speed limits for safety.

Open-Source Code and Improved Performance

In February, Physical Intelligence made the code and weights for its π0 or Pi0 robotics algorithms open-source. Earlier this month, the company announced Version 0.6 of the vision-language-action (VLA) model. The startup used the RECAP (RL with Experience & Corrections via Advantage-conditioned Policies) approach to train a robot by demonstration, coach it through corrections, and improve from autonomous experience.

Physical Intelligence claims that this approach has doubled throughput on tasks such as inserting a filter into an espresso machine, folding previously unseen laundry, or assembling a cardboard box. It also decreased failure rates over hours of operation and proved superior to imitation learning alone.

Alphabet Leads Series B Round

CapitalG, Alphabet's growth fund, led Physical Intelligence's Series B round with Lux Capital. Bond, Redpoint, and Sequoia Capital also participated. Previous and returning investors include Jeff Bezos, the executive chairman of Amazon. OpenAI, Redpoint Ventures, T. Rowe Price, and Thrive Capital have also contributed.

Increased Competition in Physical AI

Physical Intelligence is not alone in pursuing physical AI. In September, Dyna Robotics closed a $120 million Series A round to develop its proprietary foundation model. Earlier this month, Archetype AI raised $35 million for "physical agents," and Foxglove raised $40 million to scale its data platform for robot developers.

Other companies are also racing to get the data and build the models for next-generation robot AI. For instance, Physical Intelligence partner AgiBot has deployed its Real-World Reinforcement Learning system in a manufacturing pilot with Longcheer Technology. 1X Technologies has been using teleoperation to train its NEO humanoid to conduct household chores. Skild AI said it is developing a general-purpose Skild Brain.

Conclusion

Physical Intelligence's $600 million Series B funding will enable the company to further develop its foundation models for robots, allowing them to understand and interact with the physical world. The company's approach has significant implications for robot performance in unstructured environments and has already shown improved performance on various tasks. With increased competition in the field of physical AI, it will be interesting to see how Physical Intelligence and other companies continue to develop and improve their models.

Code Examples

import torch
import torchvision
import torchvision.transforms as transforms

# Load the VLA model
model = torch.hub.load('physical_intelligence/vla', 'model')

# Define the input data
input_data = torch.randn(1, 3, 224, 224)

# Run the model
output = model(input_data)

print(output)

// Load the VLA model
const model = await import('physical_intelligence/vla');

// Define the input data
const input_data = new Float32Array(1, 3, 224, 224);

// Run the model
const output = await model(input_data);

console.log(output);