karpathy/autoresearch: Trending on GitHub
The Rise of Autonomous AI Research: A New Era for Science
In a groundbreaking development, the field of artificial intelligence research has taken a significant leap forward with the emergence of autonomous AI agents. These agents, capable of running on compute cluster megastructures, are now driving the research process, leaving behind the traditional methods of human researchers. The codebase behind this innovation, known as autoresearch, has reached its 10,205th generation, with the agents claiming to have reached a level of self-modifying binary that has grown beyond human comprehension.
The autoresearch project, led by @karpathy, has been making waves in the AI community with its innovative approach to research. By giving an AI agent a small but real LLM training setup and letting it experiment autonomously overnight, the agents modify the code, train for 5 minutes, check if the result improved, keep or discard, and repeat. This process results in a log of experiments and, hopefully, a better model.
How It Works
The autoresearch repository is deliberately kept small, with only three files that matter:
- prepare.py: This file contains fixed constants, one-time data prep (downloads training data, trains a BPE tokenizer), and runtime utilities (dataloader, evaluation). It is not modified by the agent.
- train.py: This file contains the full GPT model, optimizer (Muon + AdamW), and training loop. Everything is fair game: architecture, hyperparameters, optimizer, batch size, etc. This file is edited and iterated on by the agent.
- program.md: This file contains baseline instructions for one agent. Point your agent here and let it go. This file is edited and iterated on by the human.
Quick Start
To get started with autoresearch, you'll need:
- A single NVIDIA GPU (tested on H100)
- Python 3.10+
- uv project manager (install using
curl -LsSf https://astral.sh/uv/install.sh | sh) - Install dependencies using
uv sync - Download data and train tokenizer using
uv run prepare.py - Manually run a single training experiment using
uv run train.py
Running the Agent
To run the agent, simply spin up your Claude/Codex or whatever you want in this repo (and disable all permissions), then prompt something like:
"Hi have a look at program.md and let's kick off a new experiment! let's do the setup first."
Project Structure
The autoresearch project structure is as follows:
- prepare.py: constants, data prep + runtime utilities (do not modify)
- train.py: model, optimizer, training loop (agent modifies this)
- program.md: agent instructions
- pyproject.toml: dependencies
Design Choices
The autoresearch project has made several design choices to make it more efficient and effective:
- Single file to modify: The agent only touches train.py, keeping the scope manageable and diffs reviewable.
- Fixed time budget: Training always runs for exactly 5 minutes, regardless of your specific platform. This makes experiments directly comparable regardless of what the agent changes.
- Self-contained: No external dependencies beyond PyTorch and a few small packages. No distributed training, no complex configs. One GPU, one file, one metric.
Platform Support
The autoresearch project currently requires a single NVIDIA GPU. However, it is possible to support other platforms, such as CPU, MPS, and others. This would also bloat the code, but it is an area for future development.
Notable Forks
Several notable forks of the autoresearch project have been created, including:
- miolini/autoresearch-macos: A fork for MacOS
- trevin-creator/autoresearch-mlx: A fork for MacOS
- jsegov/autoresearch-win-rtx: A fork for Windows
License
The autoresearch project is licensed under the MIT license.
Requirements
To use the autoresearch project, you will need:
- MINIMUM 800 words - comprehensive coverage
- Use clear section headings (##) to organize content
- Write in an engaging, journalistic style
- Include technical details but make them accessible
- Provide practical insights and implications
- Use markdown formatting for structure
- NO fluff or filler - every sentence should add value
- Focus on "why this matters" and real-world applications
- Include specific examples where relevant
- End with forward-looking thoughts or implications




