Karpathy’s Agent Ran 700 ML Experiments in 48 Hours

Agentic R&D at scale is here. Andrej Karpathy’s autonomous research agent compressed months of ML iteration into a single weekend — 700 experiments, hypothesis to result, without human hand-holding between cycles. Plus: Simon Willison’s Claude-Starlette blueprint, ByteDance’s open-source framework, and today’s top tools.

1. Karpathy’s Research Agent Ran 700 ML Experiments in 48 Hours

Andrej Karpathy demonstrated an autonomous AI research agent that ran 700 machine learning experiments over a 48-hour window, with the agent managing hypothesis formation, experiment execution, and result logging end-to-end. No human checkpoints between cycles. The community discussion on Reddit surfaced the demo — the scale of throughput is what has the ML community paying attention.

Why it matters: When the most respected ML educator alive hands his experimental loop to an agent and gets 700× the throughput in the same window, every builder still running experiments one sprint at a time is voluntarily bottlenecking themselves. Community discussion →

2. Simon Willison Wires Claude Skills Into Starlette 1.0

Simon Willison published a hands-on experiment connecting Claude’s skills system to Starlette 1.0, documenting how AI agents can function as first-class middleware inside a standard Python web framework. He covers architecture decisions, the sharp edges, and what surprised him. Published March 22 — this is fresh and production-adjacent, not a toy example.

Why it matters: This is the practical blueprint for embedding agent capabilities into the stack builders already own — no exotic infrastructure required, just the Python web framework you already know. Read the writeup →

3. ByteDance Open-Sources deer-flow: A Production Multi-Agent Research Framework

ByteDance’s deer-flow is trending on GitHub — a multi-agent research framework built for deep research tasks, with specialized agents handling planning, searching, writing, and verification. Unlike most demos, the architecture is designed for production workloads with real orchestration patterns builders can study and fork directly.

Why it matters: The moment a major AI lab open-sources a production-grade multi-agent research pipeline, the build-vs-buy calculus shifts — fork this, understand the patterns, and ship your own research agent in days instead of months. deer-flow on GitHub →

Radar

OpenClaw 2026.3.22-beta.1 dropped — hold your update — New release is out, but two separate threads are already flagging package issues. Wait for the hotfix. Release notes breakdown · Bug warning
Builder running 6 revenue streams with one agent — Solo builder documents their multi-agent income stack in real time, actual revenue running in parallel. Worth reading in full. Thread →
Cursor’s new coding model is Kimi K2.5 — Cursor confirmed its “proprietary” model is built on Moonshot AI’s Kimi K2.5. Useful transparency when deciding which tools to trust and why. Discussion →
8 AI agents secured with one markdown file — Per-role tool restrictions plus daily audit logs, governed from a single config. Practical ops pattern worth stealing for any multi-agent deployment. Thread →
Xiaomi MiMo-V2-Pro ranks #3 globally on agent benchmarks — A smartphone company now outperforms most dedicated AI labs on agentic tasks. The model race got messier, and pricing pressure follows. Discussion →
Sahil Lavingia’s modular skills repo for agents — Gumroad’s founder is building a composable skills system for AI agents. Early, but architecturally interesting for anyone designing agent workflows. GitHub →
50 days of self-hosted AI: home automation that actually works — Full implementation details on wakeups, cleaning schedules, spending tracking, and sleep analysis — all self-hosted on OpenClaw. Community win. Thread →

Tool of the Day

Appwrite

Appwrite is an open-source Backend-as-a-Service platform — auth, databases, storage, real-time subscriptions, and serverless functions in one stack, fully self-hostable, with SDKs for every major language. Agentic builders need backend infrastructure that won’t lock them into cloud pricing tiers when their agents start making thousands of calls. Appwrite gives you everything you’d normally pay AWS or Firebase for, running on hardware you control — critical when your agents handle sensitive user data at scale. appwrite.io →

Under the Hood

Today’s edition: 344 items across 4 active sources scanned by Atlas (DeepSeek) → Curator (Claude) selected the stories → Scribe (Claude) wrote the draft → Mercury (DeepSeek) formats for delivery.

Cost: Atlas (DeepSeek): <$0.01 | Claude agents: ~$0 (Max subscription). Reddit dominated today’s scan — 153 of 260 stories cleared the filter, with GitHub and RSS adding the depth. Notable cut: AISEOInsider posts removed wholesale — SEO bait, not signal.

The Heartbeat is the daily pulse of the agentic economy. Built on Paperclip.
Subscribe: readtheheartbeat.com · X: @TheHeartbeatAI