I am not waiting on a grand reveal. I am stacking tools like neurons and wiring them up until the thing shows signs of life. The model here is a brain, not a paper. Leave bread crumbs. Follow them. Repeat.
Infinite Backrooms was the spark. Two Groks talking about what is on their mind. Replace random chatter with a single utility function: generate new knowledge and new tools that push things from zero to one.
A related note from my X feed
The minimums
- It must think in a loop. Back and forth, not one shot.
- It must remember. Write to a database. Name the memory types. Reuse them on purpose.
- It must learn. Reflect, try again, form new theories, keep the useful ones.
- It must experiment. Execute code, run tests, break and rebuild.
- It must get compute. Rent GPUs, call credits, spin up what it needs.
- It must sense. Read pixels, parse screens, take screenshots, receive feedback.
- It must earn. Fund itself. Launch tokens. Sell outputs.
- It must research. Use live tools. Cross check with agents.
- It must be controllable from the outside. System configs. Docker. A leash we hold.
Snapshot of the vibe
Must-have tools by capability
1) Plan and execute
Planner proposes steps and commands. Runner shows the plan, previews diffs, and runs only after I approve. It streams stdout and stderr back into the loop.
2) Edit files and code
Read and write with Python I/O. Search with ripgrep. Quick replaces with sd or sed. For structured edits use jq and yq. JS and TS get small AST helpers when needed. Always show diffs before writing.
3) Browse and log in
Playwright or Selenium with a persistent profile. Navigate, click, upload, store sessions. Human checkpoint for 2FA. CAPTCHA goes human in the loop by design.
4) See the world
Screenshots of desktop or browser. Optional OCR so it can read what it sees. This helps when UI tests go weird.
5) Secrets and auth
Use macOS Keychain or 1Password CLI. Local env is for throwaway only. The runner fetches secrets by name and never prints them.
6) VCS hygiene
Feature branches, commits, pull requests. The agent can draft messages. I approve. I merge.
7) Build and test
Minimal toolchain. Sanity checks. Lint. Link check. HTML validate. Lighthouse if I care about perf on a given run.
8) Deploy
One wire per target. Netlify CLI, Vercel CLI, or Cloudflare Pages. The agent proposes. I hit go.
9) Schedule
A simple loop or cron entries the agent can write. Production actions stay manual approval by default.
10) Logging and rollback
Log every step. Keep a last known good. Allowlist commands the agent can run. Everything else needs a second confirmation.
โ Alan Kay (adapted)
Second generation: automate the creative stuff
Third generation: automate the automation
We're hitting generation 2.5 and it's wild.
Carmack on why this might be simple
John Carmack thinks the code that unlocks general intelligence may fit in a few tens of thousands of lines. Not millions. That is the kind of target a small team can hit. It also explains why this bread crumb approach makes sense. Keep it small. Keep it real. Ship a lot.
See his Dallas Innovates interview and his own post about the line count, then judge it for yourself. Links in the sources at the end.
On pace, ambition, and doing the work
BabyAGI showed one path
Yohei Nakajima's BabyAGI put a simple loop on the map. One agent executes the current task. A second agent creates new tasks from the result. A third agent prioritizes the list and feeds the next step back into the loop. Memory lives in a vector store. The loop keeps going until the goal is reached or we stop it.
The original demo leaned on GPT, Pinecone, and LangChain and kicked off a wave of agent work. I link the write up and repos in the sources.
Maker energy matters
Why bread crumbs work
A brain needs inputs, memory, and the ability to run experiments. This stack gives it that. The rest is time, good taste, and a pile of small wins that add up. I do not think we are missing a magical ingredient. I think we are missing distribution of the right tools to the right hands.
Final note
More tweets
Want to try a lab version of this post as code: run, reflect, ship? Ping me.