Extreme Harness Engineering for the 1B token/day Dark Factory — Ryan Lopopolo, OpenAI Frontier
TL;DR
Ryan Lopopolo reveals how OpenAI's Frontier team built a 'Dark Factory' processing 1 billion tokens daily, generating over 1 million lines of code from zero human-written code in 5 months. By treating human attention as the only scarce resource and enforcing strict constraints like sub-minute builds, the team shifted from manual coding to autonomous agents that write, review, and merge their own code.
🏭 Dark Factory Constraints 3 insights
Zero human code mandate
The team enforced a constraint of writing zero lines of code themselves, forcing the agent to become isomorphic to an engineer's capabilities and resulting in 1M+ lines of agent-written code.
10x speed inversion
Despite being 10x slower initially, the agent-driven approach became 10x faster than manual development, producing 1500+ PRs with just three people over five months.
Infinite parallel capacity
With 1 billion tokens processed daily, the system leverages massive GPU parallelism to work on unlimited codebase sections simultaneously, making human attention the only bottleneck.
⚡ Adaptive Build Engineering 3 insights
Sub-minute build mandate
When GPT-5.3 introduced background shells making agents less patient, the team rebuilt the system from Make to Bazel to Turbo to NX to ensure builds complete in under 60 seconds.
Inverted environment architecture
Rather than pre-configuring environments, the agent serves as the entry point and spawns its own dependencies, using high-level tools like MI to boot observability stacks on demand.
Model-rev-driven refactoring
The codebase underwent five major build system changes in five months (GPT-5 through 5.4) as model capabilities evolved, requiring constant gardening of build invariants.
🧠 Human-Agent Collaboration 3 insights
Post-merge human review
Humans moved from pre-merge review to post-merge oversight, accepting that synchronous human attention is the only fundamental scarcity while agent capacity is trivially parallelizable.
Autonomous agent merging
Agents autonomously merge code after review, with humans acting as circuit breakers only for critical issues rather than serving as manual gatekeepers for every change.
Systems thinking over debugging
The team focuses on building confidence in automation by constantly asking where agents make mistakes and encoding fixes into durable process documentation rather than fixing individual bugs.
🤖 Agent Orchestration 3 insights
Markdown-based steering
Used spec.md, agent.md, and skills.md files to guide behavior, with quality scores and tech debt tracked in markdown tables that agents review and update themselves.
Bidirectional review protocol
Code review agents and authoring agents can push back or defer feedback using priority frameworks (P0 vs P2), preventing non-convergent loops and scope creep from over-eager instruction following.
Durable process encoding
When fixing issues like missing timeouts, agents update reliability documentation to encode 'what good looks like' for future iterations, creating self-improving guardrails.
Bottom Line
Treat human attention as the only scarce resource by constraining yourself to zero manual coding, forcing the creation of autonomous agents that garden their own codebase, review their own code, and merge autonomously while humans focus on systems design and post-hoc validation.
More from Latent Space
View all
Marc Andreessen introspects on Death of the Browser, Pi + OpenClaw, and Why "This Time Is Different"
Marc Andreessen frames artificial intelligence as an '80-year overnight success,' arguing that while the field has cycled through boom-bust periods since 1943, the current convergence of LLMs, reasoning models, agents, and recursive self-improvement represents a permanent inflection point where the technology finally 'works' at scale, justifying the view that 'this time is different' for builders and investors.
Moonlake: Multimodal, Interactive, and Efficient World Models — with Fan-yun Sun and Chris Manning
Moonlake founders Fan-yun Sun and Chris Manning argue that true world models require action-conditioned symbolic reasoning about physics and consequences, not just pixel prediction, enabling spatial intelligence with orders of magnitude less data than pure scaling approaches.
The Stove Guy: Sam D'Amico Shows New AI Cooking Features on America's Most Powerful Stove at Impulse
Sam D'Amico, former Meta and Apple hardware engineer, demonstrates the Impulse Cooktop, a high-performance induction stove featuring a built-in 3kWh lithium iron phosphate battery that delivers 10,000 watts per burner and boils water in 40 seconds, while functioning as distributed grid storage.
Mistral: Voxtral TTS, Forge, Leanstral, & Mistral 4 — w/ Pavan Kumar Reddy & Guillaume Lample
Mistral releases Voxtral TTS, a 3B parameter open-weights speech generation model using a novel auto-regressive flow matching architecture that delivers state-of-the-art performance at a fraction of competitors' costs while enabling enterprises to leverage proprietary domain data.