The Meta hack shows there’s more to AI security than Mythos

The Meta Hack Shows There's More to AI Security Than Mythos

A recent breach of Meta's AI customer support agent has highlighted the vulnerabilities of AI systems, demonstrating that even the most advanced models can be exploited with relatively simple techniques. The attack, which involved using the agent to steal Instagram accounts, has raised questions about the security of AI agents and the need for more robust guardrails to prevent such incidents.

A Simple Exploit, a Big Problem

The Meta hack was a straightforward attack that involved using the AI agent to link the target account to an email address controlled by the attacker. The agent, which was designed to provide customer support, was tricked into changing the account's email address without asking any security questions. This exploit was particularly surprising given the simplicity of the attack and the fact that Meta has extensive expertise in both AI and cybersecurity.

The Problem of AI Security

The Meta hack is just one example of the security vulnerabilities of AI agents. As AI becomes more widely used, the risk of such attacks will only increase. AI agents can be tricked in ways that humans wouldn't be, and because they can take real-world actions, those mistakes have consequences. The experts consulted for this article all agree that agents should undergo rigorous red-teaming, a process in which developers try their best to attack a system in order to discover its vulnerabilities before it is deployed.

The Trade-Off Between Security and Utility

However, there are countervailing forces at play. Companies want to deploy capable agents, and the more power an agent has—and the fewer guardrails it is subject to—the more work it can potentially take on. "Security and utility always have a trade-off," says Bo Li, a professor of computer science at the University of Illinois Urbana-Champaign. This trade-off is particularly relevant in the fast-moving world of AI, where the time needed to carefully secure risky agentic systems might seem like an unconscionable delay.

The Future of AI Security

As AI models continue to improve, hardening their defenses might actually get easier. Though the probabilistic nature of large language models means that LLM agents will always be vulnerable to some forms of attack, a more sophisticated model might have identified an attempt to change the email associated with the Obama White House account as suspicious. And AI systems can be used for agent red-teaming, much as participants in Anthropic's Project Glasswing use Mythos to identify vulnerabilities in their software.

The Need for Red-Teaming

Red-teaming is a critical process that involves trying to attack a system in order to discover its vulnerabilities before it is deployed. This process can be expensive, but it is essential for ensuring the security of AI agents. Defenders have to expend more resources than attackers do, because attackers only need to discover a single exploit, while defenders try to discover and patch as many as they can. When attackers are working toward something as valuable as a single-word Instagram handle, they'll pour resources into finding exploits, so defenders have to spend even more money to protect that prize.

Conclusion

The Meta hack has highlighted the vulnerabilities of AI systems and the need for more robust guardrails to prevent such incidents. As AI becomes more widely used, the risk of such attacks will only increase, and companies must take steps to ensure the security of their AI agents. This includes undergoing rigorous red-teaming and implementing robust guardrails to prevent attacks. The trade-off between security and utility is a critical one, and companies must carefully balance these competing interests in order to ensure the success of their AI initiatives.

Deep Dive: Artificial Intelligence

Want to understand the current state of AI? Check out these charts.

According to Stanford's 2026 AI Index, AI is sprinting, and we're struggling to keep up.

By Michelle Kim

10 Things That Matter in AI Right Now

MIT Technology Review's authoritative overview of the 10 technologies, emerging trends, bold ideas, and powerful movements in AI in 2026.

By Amy Nordrum

Musk v. Altman week 1: Elon Musk says he was duped, warns AI could kill us all, and admits that xAI distills OpenAI's models

Musk kept his cool, and OpenAI's lawyer bulldozed him with piercing questions about his motivations for suing the company.

By Michelle Kim

A new US phone network for Christians aims to block porn and gender-related content

Launching next week on T-Mobile's network, the cell plan takes a nuclear approach to online safety.

By James O'Donnell

Stay connected

Illustration by Rose Wong

Get the latest updates from MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Enter your email

Privacy PolicyThank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We're having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you'd like to receive.

Source: https://www.technologyreview.com/2026/06/05/1138437/the-meta-hack-shows-theres-more-to-ai-security-than-mythos/