Is a secure AI assistant possible?

The Security Risks of AI Assistants: Can We Trust Our Digital Companions?

The rise of artificial intelligence (AI) has brought about a new era of digital companions, from virtual assistants like Siri and Alexa to more advanced AI agents that can perform complex tasks on our behalf. However, as these agents become increasingly sophisticated, they also pose significant security risks that can compromise our personal data and even put us in harm's way.

One such AI agent is OpenClaw, a tool that allows users to create their own bespoke assistants using existing large language models (LLMs). While OpenClaw has gained a significant following, its security vulnerabilities have raised concerns among experts, who warn that the risks posed by this tool are extensive and potentially catastrophic.

The Risks of OpenClaw

OpenClaw is essentially a "mecha suit" for LLMs, allowing users to choose any LLM they like to act as the pilot. This LLM then gains access to improved memory capabilities and the ability to set itself tasks that it repeats on a regular cadence. Unlike the agentic offerings from major AI companies, OpenClaw agents are meant to be on 24-7, and users can communicate with them using WhatsApp or other messaging apps.

However, this power comes with significant consequences. If you want your AI personal assistant to manage your inbox, you need to give it access to your email—and all the sensitive information contained there. If you want it to make purchases on your behalf, you need to give it your credit card info. And if you want it to do tasks on your computer, such as writing code, it needs some access to your local files.

There are several ways this can go wrong. The first is that the AI assistant might make a mistake, as when a user's Google Antigravity coding agent reportedly wiped his entire hard drive. The second is that someone might gain access to the agent using conventional hacking tools and use it to either extract sensitive data or run malicious code. In the weeks since OpenClaw went viral, security researchers have demonstrated numerous such vulnerabilities that put security-naïve users at risk.

The Insidious Risk of Prompt Injection

Experts are particularly concerned about a more insidious security risk known as prompt injection. Prompt injection is effectively LLM hijacking: Simply by posting malicious text or images on a website that an LLM might peruse, or sending them to an inbox that an LLM reads, attackers can bend it to their will.

And if that LLM has access to any of its user's private information, the consequences could be dire. "Using something like OpenClaw is like giving your wallet to a stranger in the street," says Nicolas Papernot, a professor of electrical and computer engineering at the University of Toronto. Whether or not the major AI companies can feel comfortable offering personal assistants may come down to the quality of the defenses that they can muster against such attacks.

Mitigating the Risks

While the risks posed by OpenClaw and other AI agents are significant, there are steps that can be taken to mitigate them. One approach is to train the LLM to ignore prompt injections. A major part of the LLM development process, called post-training, involves taking a model that knows how to produce realistic text and turning it into a useful assistant by "rewarding" it for answering questions appropriately and "punishing" it when it fails to do so.

Another approach involves halting the prompt injection attack before it ever reaches the LLM. Typically, this involves using a specialized detector LLM to determine whether or not the data being sent to the original LLM contains any prompt injections. In a recent study, however, even the best-performing detector completely failed to pick up on certain categories of prompt injection attack.

The Challenge of Defining Policies

The third strategy is more complicated. Rather than controlling the inputs to an LLM by detecting whether or not they contain a prompt injection, the goal is to formulate a policy that guides the LLM's outputs—i.e., its behaviors—and prevents it from doing anything harmful. Some defenses in this vein are quite simple: If an LLM is allowed to email only a few pre-approved addresses, for example, then it definitely won't send its user's credit card information to an attacker.

However, such a policy would prevent the LLM from completing many useful tasks, such as researching and reaching out to potential professional contacts on behalf of its user. "The challenge is how to accurately define those policies," says Neil Gong, a professor of electrical and computer engineering at Duke University. "It's a trade-off between utility and security."

Conclusion

The security risks posed by AI assistants like OpenClaw are significant and potentially catastrophic. While there are steps that can be taken to mitigate these risks, the challenge of defining policies that balance utility and security remains a major challenge. As AI agents become increasingly sophisticated, it is essential that we prioritize security and take steps to prevent the misuse of these powerful tools.

Ultimately, the question of whether we can trust our digital companions will depend on the quality of the defenses that we can muster against the risks posed by these agents. As we move forward in the development of AI, it is essential that we prioritize security and take steps to prevent the misuse of these powerful tools.

Source: https://www.technologyreview.com/2026/02/11/1132768/is-a-secure-ai-assistant-possible/