Our greatest human achievements have often taken the shape of architectural superstructures: Pyramids, Cathedrals, Skyscrapers. Enormous feats built through millions of individual contributions.
We've been delivering such marvels for thousands of years. Can AI agents scale to similar heights?
AI is remarkably adept at small linguistic feats: writing essays, composing poems, constructing software code. Increasingly faster and better than most humans. But these are small, isolated tasks. Tantamount to laying bricks in the construction of a tower.
I've worked with AI agents for the better part of 2 years, and I consistently find myself playing the part of foreman:
"here is your next task..."
"that piece there is too big…"
"this here isn't working…"
Why can't AI do this itself? What is this contribution that still necessitates my involvement?
Simply this: get me from point A to point B. Many problems in life take this form, from physical journeys, to building construction, and even software development. The key is to keep moving toward the destination.
This requires:
- Having a clear understanding of point B
- Knowing your current position
- Continually adapting a plan to get there
Can today's LLMs do this?
I gave this exact challenge to three leading LLMs—follow a path from A to B, through obstacles:
- Claude 3.5 Sonnet: 14 iterations to get to B. Struggled to recognize the fence gap.
- Gemini 2.5 Pro (exp): 6 iterations to get to B. Often taking one step too many or too few.
- OpenAI O3: 1 iteration, nailed it!
All reached B eventually. Repeated trials showed Gemini occasionally solving it in 1 shot.
Combined with the idea in my last post (being able to scale to different levels of perspective), the ability to chart a course (and adapt it) becomes:
- Start with a macro plan – what components make up the system? What's the roadmap?
- Break down each component – which files, procedures, or APIs need to be added/changed?
- Repeat until we reach micro tasks – each executable by a single agent or team.
This is classical task decomposition, but with ongoing adjustment and course correction. Hence we can be confident that an individual brick is taking us in the right direction, because small steps scale into a master plan to get to point B.
Plotting and adjusting a course has a long tradition in human ventures. From crossing oceans to raising cathedrals, we celebrate leaders who keep a mission on course. I'd argue today's AI Agents can navigate (dare I say "lead") large initiatives, provided we assign the right specialist agents to the job.
Key Lessons:
- Plan → Do → Check → Repeat: You can't one-shot a full build.
- Specialize your agents: Pair planners (orchestrators) with coders, not one-size-fits-all.
- Measure what matters: Track distance to B, not distance from A.