From guardrails to governance: A CEO’s guide for securing agentic systems
From Guardrails to Governance: A CEO's Guide for Securing Agentic Systems
As the use of artificial intelligence (AI) continues to grow, so does the risk of AI-related security breaches. In recent years, we've seen several high-profile incidents of AI-powered espionage, where attackers have used AI systems to gain unauthorized access to sensitive information. These incidents have raised concerns among CEOs and boards of directors, who are now asking: What do we do about agent risk?
The Problem with Prompt-Level Control
The previous article in this series, "Rules Fail at the Prompt, Succeed at the Boundary," focused on the first AI-orchestrated espionage campaign and the failure of prompt-level control. This article is the prescription. The question every CEO is now getting from their board is some version of: What do we do about agent risk?
Treating Agents Like Powerful, Semi-Autonomous Users
Across recent AI security guidance from standards bodies, regulators, and major providers, a simple idea keeps repeating: treat agents like powerful, semi-autonomous users, and enforce rules at the boundaries where they touch identity, tools, data, and outputs. This approach is supported by several key organizations, including Google's Secure AI Framework (SAIF) and NIST AI's access-control guidance.
Eight Controls, Three Pillars: Govern Agentic Systems at the Boundary
To implement this approach, we recommend the following eight-step plan, which can be grouped into three pillars: constrain capabilities, control data and behavior, and prove governance and resilience.
Constrain Capabilities
These steps help define identity and limit capabilities.
1. Identity and Scope: Make Agents Real Users with Narrow Jobs
Today, agents run under vague, over-privileged service identities. The fix is straightforward: treat each agent as a non-human principal with the same discipline applied to employees. Every agent should run as the requesting user in the correct tenant, with permissions constrained to that user's role and geography. Prohibit cross-tenant on-behalf-of shortcuts. Anything high-impact should require explicit human approval with a recorded rationale.
The CEO question: Can we show, today, a list of our agents and exactly what each is allowed to do?
2. Tooling Control: Pin, Approve, and Bound What Agents Can Use
The Anthropic espionage framework worked because the attackers could wire Claude into a flexible suite of tools (e.g., scanners, exploit frameworks, data parsers) through Model Context Protocol, and those tools weren't pinned or policy-gated. The defense is to treat toolchains like a supply chain:
- Pin versions of remote tool servers.
- Require approvals for adding new tools, scopes, or data sources.
- Forbid automatic tool-chaining unless a policy explicitly allows it.
The CEO question: Who signs off when an agent gains a new tool or a broader scope? How does one know?
3. Permissions by Design: Bind Tools to Tasks, Not to Models
A common anti-pattern is to give the model a long-lived credential and hope prompts keep it polite. SAIF and NIST argue the opposite: credentials and scopes should be bound to tools and tasks, rotated regularly, and auditable. Agents then request narrowly scoped capabilities through those tools.
In practice, that looks like: "finance-ops-agent may read, but not write, certain ledgers without CFO approval."
The CEO question: Can we revoke a specific capability from an agent without re-architecting the whole system?
Control Data and Behavior
These steps gate inputs, outputs, and constrain behavior.
4. Inputs, Memory, and RAG: Treat External Content as Hostile Until Proven Otherwise
Most agent incidents start with sneaky data: a poisoned web page, PDF, email, or repository that smuggles adversarial instructions into the system. OWASP's prompt-injection cheat sheet and OpenAI's own guidance both insist on strict separation of system instructions from user content and on treating unvetted retrieval sources as untrusted.
Operationally, gate before anything enters retrieval or long-term memory: new sources are reviewed, tagged, and onboarded; persistent memory is disabled when untrusted context is present; provenance is attached to each chunk.
The CEO question: Can we enumerate every external content source our agents learn from, and who approved them?
5. Output Handling and Rendering: Nothing Executes "Just Because the Model Said So"
In the Anthropic case, AI-generated exploit code and credential dumps flowed straight into action. Any output that can cause a side effect needs a validator between the agent and the real world. OWASP's insecure output handling category is explicit on this point, as are browser security best practices around origin boundaries.
The CEO question: Where, in our architecture, are agent outputs assessed before they run or ship to customers?
6. Data Privacy at Runtime: Protect the Data First, Then the Model
Protect the data such that there is nothing dangerous to reveal by default. NIST and SAIF both lean toward "secure-by-default" designs where sensitive values are tokenized or masked and only re-hydrated for authorized users and use cases.
In agentic systems, that means policy-controlled detokenization at the output boundary and logging every reveal. If an agent is fully compromised, the blast radius is bounded by what the policy lets it see.
The CEO question: When our agents touch regulated data, is that protection enforced by architecture or by promises?
Prove Governance and Resilience
For the final steps, it's essential to show controls work and keep working.
7. Continuous Evaluation: Don't Ship a One-Time Test, Ship a Test Harness
Anthropic's research about sleeper agents should eliminate all fantasies about single test dreams and show how critical continuous evaluation is. This means instrumenting agents with deep observability, regularly red teaming with adversarial test suites, and backing everything with robust logging and evidence, so failures become both regression tests and enforceable policy updates.
The CEO question: Who works to break our agents every week, and how do their findings change policy?
8. Governance, Inventory, and Audit: Keep Score in One Place
AI security frameworks emphasize inventory and evidence: enterprises must know which models, prompts, tools, datasets, and vector stores they have, who owns them, and what decisions were taken about risk.
For agents, that means a living catalog and unified logs:
- Which agents exist, on which platforms
- What scopes, tools, and data each is allowed
- Every approval, detokenization, and high-impact action, with who approved it and when
The CEO question: If asked how an agent made a specific decision, could we reconstruct the chain?
Conclusion
Taken together, these controls do not make agents magically safe. They do something more familiar and more reliable: they put AI, its access, and actions back inside the same security frame used for any powerful user or system.
For boards and CEOs, the question is no longer "Do we have good AI guardrails?" It's: Can we answer the CEO questions above with evidence, not assurances?
By implementing these eight-step plan and three-pillar approach, you can better govern your agentic systems, constrain capabilities, control data and behavior, and prove governance and resilience. This will help you mitigate the risks associated with AI and ensure that your organization is prepared for the challenges of the future.




