This is the most misunderstood graph in AI

The Most Misunderstood Graph in AI

A Deep Dive into the METR Plot

The METR plot, a graph created by the Model Evaluation & Threat Research (METR) nonprofit, has been making waves in the AI community since its release in March 2023. The graph suggests that certain AI capabilities are developing at an exponential rate, and more recent model releases have outperformed that already impressive trend. However, the truth is more complicated than those dramatic responses would suggest.

The Plot Thickens

The METR plot is frequently passed around on social media without the context, and so the true meaning of the time horizon metric can get lost in the shuffle. One common misapprehension is that the numbers on the plot's y-axis—around five hours for Claude Opus 4.5, for example—represent the length of time that the models can operate independently. They do not. They represent how long it takes humans to complete tasks that a model can successfully perform.

The Time Horizon Metric

To understand exactly what model time horizons are, it helps to know all the work that METR put into calculating them. First, the METR team assembled a collection of tasks ranging from quick multiple-choice questions to detailed coding challenges—all of which were somehow relevant to software engineering. Then they had human coders attempt most of those tasks and evaluated how long it took them to finish. In this way, they assigned the tasks a human baseline time. Some tasks took the experts mere seconds, whereas others required several hours.

The Model's Time Horizon

When METR tested large language models on the task suite, they found that advanced models could complete the fast tasks with ease—but as the models attempted tasks that had taken humans more and more time to finish, their accuracy started to fall off. From a model's performance, the researchers calculated the point on the time scale of human tasks at which the model would complete about 50% of the tasks successfully. That point is the model's time horizon.

The METR Plot's Limitations

Just because a model achieves a one-hour time horizon on the METR plot, however, doesn't mean that it can replace one hour of human work in the real world. For one thing, the tasks on which the models are evaluated don't reflect the complexities and confusion of real-world work. In their original study, Kwa, Von Arx, and their colleagues quantify what they call the "messiness" of each task according to criteria such as whether the model knows exactly how it is being scored and whether it can easily start over if it makes a mistake (for messy tasks, the answer to both questions would be no).

The Plot's Implications

Despite these limitations, many people admire the group's research. "The METR study is one of the most carefully designed studies in the literature for this kind of work," Kang told me. Even Gary Marcus, a former NYU professor and professional LLM curmudgeon, described much of the work that went into the plot as "terrific" in a blog post.

Forward-Looking Thoughts

The METR plot is far from a perfect instrument, but in a new and fast-moving domain, even imperfect tools can have enormous value. "This is a bunch of people trying their best to make a metric under a lot of constraints. It is deeply flawed in many ways," Von Arx says. "I also think that it is one of the best things of its kind."

Conclusion

The METR plot is a complex and multifaceted tool that has been misunderstood by many. While it has its limitations, it provides valuable insights into the development of AI capabilities and the potential implications for the future. As the AI community continues to evolve and grow, it is essential to have tools like the METR plot to help us understand and navigate the complexities of this rapidly changing field.

Deep Dive

Artificial intelligence

The great AI hype correction of 2025

Four ways to think about this year's reckoning.

By Will Douglas Heaven

Meet the new biologists treating LLMs like aliens

By studying large language models as if they were living things instead of computer programs, scientists are discovering some of their secrets for the first time.

By Will Douglas Heaven

Yann LeCun's new venture is a contrarian bet against large language models

In an exclusive interview, the AI pioneer shares his plans for his new Paris-based company, AMI Labs.

By Caiwei Chen

What's next for AI in 2026

Our AI writers make their big bets for the coming year—here are five hot trends to watch.

By Rhiannon Williams

Will Douglas Heaven

Caiwei Chen

James O'Donnell

Michelle Kim

Stay connected

Illustration by Rose Wong

Get the latest updates from MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Enter your email

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We're having trouble saving your preferences.

Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you'd like to receive.

Source: https://www.technologyreview.com/2026/02/05/1132254/this-is-the-most-misunderstood-graph-in-ai/