Agile was built to help humans coordinate. AI changes the bottleneck.
In a normal software org, the hard part used to be getting product, engineering, QA, security, compliance, and ops to line up. Agile made that easier. It tightened feedback loops and made planning less brittle. But it still assumed humans were doing almost all the execution.
That assumption is now breaking. AI can draft, code, test, document, and fix inside the same working session. So the constraint is no longer just effort. It is the machinery wrapped around effort: handoffs, approvals, rituals, sprint theater, status translation, and all the layers built to coordinate people when people were the production system. If you keep that stack and add AI on top, you a faster engine trapped in traffic.
The better model has four moving parts: Outcome Pods, The Harness, the Strategic Layer, and a Two-Swarm testing model. In other words, small teams at the edge. Shared control in the middle. Clear rules at the top. Constant attack-testing built into the loop.
This essay explains this operating model shift.
1. Outcome Pods
The basic development unit shifts from scrum team to outcome pod.
An outcome pod is built around a result, not a function. It owns a business outcome, the system surface it can change, the tools it can use, and the risk boundaries it has to respect. That sounds obvious and similar to scrum in many ways. But it’s quite different.
Output pods can be as small as 1 person, or as big as 8. A lot of AI org advice gets sloppy here and assumes every pod should be tiny. That is wrong.
Small Pods: 1–3 people
Use small pods when the work is bounded and speed matters.
Good examples:
- feature work on a narrow product surface
- internal tooling
- workflow automation
- support deflection flows
- documentation and testing acceleration
- fast iteration with tight feedback loops
In these cases, a small pod can move from idea to shipped result without dragging the work through five layers of translation. It can define the outcome, pull the right building blocks from the harness, run generation and evaluation loops, and ship.
This is where a team of one can build a complete, controlled and bounded system from design to planning to coding to test on their own with the help of a coding agent, for example. Fewer handoffs, less waiting, and less ceremony around work that no longer needs a human chain to move.
Large Pods: 5–8+ people
Use larger pods when the system is entangled.
That means things like:
- existing enterprise code
- SAP modernization
- mainframe decomposition
- regulated workflow redesign
- deep platform migration
- cross-domain system change
Here the problem is not speed at the feature layer. It is institutional density. Too many dependencies. Too many exception paths. Too many brittle systems. A two-person pod will not look lean. It will look dead.
This is the first place where executive teams get confused. They ask, “Should all teams get smaller?” Wrong question. The question is: what team shape matches the complexity of the system being changed?
If the work is bounded, go small. If the work touches legacy infrastructure, regulated logic, or a dozen interlocked workflows, size the pod to the mess. Pretending otherwise is org design cosplay.
A practical pod model is simple:
- define the business outcome
- define the system surface the pod can touch
- assign a risk tier
- pull approved workflows, permissions, and tests from the harness
- escalate exceptions instead of inventing local hacks
That last part matters more than people think. If every pod creates its own prompts, tool wrappers, access logic, and fallback behavior, you do not have autonomy. You have drift.
2. The Harness
The harness is the product behind the product.
Most companies focus on the visible layer, which is the pod. That is understandable. Pods ship things. But the harness is what makes pod speed usable at enterprise scale.
It should own:
- shared prompt and workflow patterns
- tool and data permissions
- architecture constraints
- testing and evaluation standards
- policy enforcement
- telemetry and audit evidence
- rollback and exception logic
Without this layer, every pod becomes its own little AI stack. Different prompts. Different access rules. Different logging. Different wrappers. Different audit trails. That feels fast right up until nobody can tell what has access to what, which agent can do what, or why a workflow made a decision.
The centralizes access patterns, permissions, logging, and policy. It narrows context to what the job actually needs. It limits which tools are visible in which workflows. It captures evidence while work happens instead of forcing teams to reconstruct it later. It blocks unsafe actions upstream.
This is governance and performance infrastructure.
Too much context hurts model performance. Too many exposed tools hurt model performance. Weak permission boundaries create not just security risk but operational drag, because now every team is debugging weird edge behavior in its own private setup. A good harness removes noise, removes ambiguity, and gives every pod a better starting point than a blank page.
If you want a simple image for this, think of the harness as the wiring loom in a car. Nobody buys the car because of the loom. But without it, nothing connects cleanly, faults become hard to trace, and every repair turns into chaos. The visible product sits on top of invisible discipline.
3. The Strategic Layer
The strategic layer’s job is to define the hard rules that the rest of the system runs on.
That includes:
- risk tiers
- data boundaries
- architecture rules
- identity limits
- autonomy limits
- review thresholds for high-consequence actions
- portfolio priorities
This layer matters because AI systems do not just answer questions anymore. They can take actions, call tools, trigger workflows, and operate across domains. That creates collision risk.
Without a strategic layer, each function optimizes locally. Finance builds one set of rules. Support builds another. Product invents its own. Ops makes exceptions because it has to keep the lights on. Soon the company is full of smart local choices that do not add up to a coherent system.
This is also where identity limits matter.
Every enterprise has places where it is willing to move fast and places where it is not. The strategic layer has to say that plainly. Which workflows can run with high autonomy? Which systems are read-only? Which actions require a human? Where is the company willing to trade certainty for speed? Where is it absolutely unwilling to do that?
Those are not implementation details. They are executive decisions.
Then the harness turns those decisions into runtime logic. And the pods execute inside those boundaries.
That is the model:
- strategy sets the rules
- the harness encodes the rules
- pods execute inside the rules
4. Risk Has Moved: From Content Risk to Execution Exposure
A lot of enterprise AI governance is still stuck in the first wave of fear: hallucinations, toxic output, prompt leaks, embarrassing copy.
That stuff still matters. It is just no longer the main event.
Once AI systems can call tools, touch systems of record, chain actions together, and persist state, the real risk shifts. The question becomes: what was the system allowed to do, what did it actually do, and can you reconstruct the chain after the fact?
That is execution exposure.
And this is where the confused deputy problem becomes real in business terms, not just security terms.
A confused deputy is a system with legitimate privilege getting used in the wrong way on behalf of someone or something with less privilege. In plain English: an agent is allowed to do something in one context, then gets nudged into doing something adjacent that it should not.
In an enterprise, that can look like:
- an agent with deployment rights acting on a weak instruction chain
- an internal assistant pulling records it should never combine
- a workflow agent triggering a downstream action the original requester was not authorized to cause
- an AI layer using broad access because nobody bothered to define narrower lanes
This is why the strategic layer cannot be vague and why the harness cannot be optional. Permission design is now operating model design.
If you cannot inventory your agents, their tools, their permissions, their memory patterns, and their action paths, you do not have governance. You have hope.
5. Two-Swarm Testing
Testing has to change too.
In the old model, QA sat downstream. Build first. Test later. Escalate defects. Triage. Repeat.
That is too slow for AI-driven systems and too weak for agentic behavior. The better model is two swarms running in parallel:
- a development swarm that builds, fixes, documents, and proposes releases
- an adversarial swarm that tries to break the system in real time
The adversarial swarm should probe for:
- secret leakage
- over-permissioning
- confused deputy behavior
- unsafe tool use
- brittle retrieval chains
- weak fallbacks
- failure modes that look compliant until they are not
This is not red-team theater. It is a production model.
If agents are going to investigate incidents, recommend actions, trigger workflows, or remediate issues under guardrails, then reliability cannot depend on a final checkpoint at the end of the pipe. It has to be contested continuously.
In practice, that means:
- the dev swarm generates code, tests, runbooks, and remediation paths
- the adversarial swarm pushes on exploit paths, degraded states, rollback logic, and policy gaps
- both run through the harness so evidence is captured and standards stay consistent
If you want to operate complex AI systems with any confidence, this is the bar. One swarm trying to ship. Another trying to break. Both inside shared rules.
6. What This Changes
Waterfall optimized for stability.
Agile optimized for coordination.
AI-first operating models optimize for leverage.
Each model reflects the main cost of its era. When physical rework was expensive, stability mattered. When software complexity and cross-functional coordination were the bottleneck, Agile mattered. Now generation and execution are getting cheaper. So the scarce resource shifts again.
Now the scarce resources are:
- judgment
- clean context
- permission design
- system boundaries
- exception handling
- organizational throughput
This is why Agile becomes a bottleneck in its bloated enterprise form. Not because iteration is bad. Because too much of modern Agile is really just a coordination tax designed for a world where humans did nearly all the execution.
The winners will use small outcome pods where the work is bounded.
They will use larger pods where the system is deeply entangled.
They will build a real harness so speed does not turn into sprawl.
They will keep a strategic layer that sets hard boundaries without turning into a committee.
They will treat testing as a live contest, not a downstream gate.
Strategic FAQ
Do companies need to kill Agile to become AI-first?
No. They need to kill the parts of Agile that mainly exist to coordinate handoffs that no longer need to happen.
Short feedback loops still matter. Clear priorities still matter. Fast learning still matters. What breaks is the ceremony stack built around human throughput: grooming marathons, estimation theater, ritual status updates, multilayer approvals, and all the process furniture that survives because it protects roles more than outcomes.
This is where organizational psychology matters. Process often doubles as identity. People defend ceremonies not because the ceremonies work, but because those rituals are where their authority sits. So the shift is not just operational. It is political. The right move is selective demolition: keep what improves decisions, remove what mainly documents delay.
How should a CEO decide when a pod should be tiny versus larger?
Start with the system being changed, not the headcount target and not the org chart. If the work touches a bounded surface with clear feedback loops, a small pod is usually the right answer. The point is to compress translation and let high-context operators move directly from problem to shipped outcome.
But old systems change the math. SAP, mainframes, regulated workflows, and cross-domain process redesign come with dense exception paths, brittle dependencies, and institutional memory trapped in too many places. In those settings, a larger pod is not waste. It is realism. The mistake is applying one fashionable team shape to every problem.
Why is the harness more important than the pods?
Because pod speed without shared control becomes enterprise chaos.
Pods move work. The harness makes work governable. It encodes institutional knowledge into reusable systems: permissions, policies, evaluation standards, context windows, evidence capture, rollback logic, and exception paths. Without that layer, each pod improvises its own mini operating model. That feels empowering at first. Then quality drifts, access sprawls, and nobody can tell which local shortcut just became enterprise risk.
This is also why many AI transformations look good in demos and fail in scale-up. Leaders see local velocity and assume the model is working. But if that speed rests on ad hoc prompts, unmanaged tool access, and invisible policy gaps, the debt is merely hidden. The harness is what turns isolated wins into an operating model.
Sources
[^1]: Databricks, Building a High-Performance Data and AI Organization (MIT Technology Review Insights report), https://www.databricks.com/resources/whitepaper/mit-technology-review-insights-report ; SAP, SAP and Databricks Open a Bold New Era of Data and AI, February 13, 2025, https://news.sap.com/2025/02/sap-databricks-open-bold-new-era-data-ai/
[^2]: Cloudflare, Scaling MCP adoption: Our reference architecture for simpler, safer and cheaper enterprise deployments of MCP, April 14, 2026, https://blog.cloudflare.com/enterprise-mcp/
[^3]: Harness, Designing MCP for the Age of AI Agents, March 19, 2026, https://www.harness.io/blog/harness-mcp-server-redesign
[^4]: Google Cloud, Build and manage multi-system agents with Vertex AI, April 9, 2025, https://cloud.google.com/blog/products/ai-machine-learning/build-and-manage-multi-system-agents-with-vertex-ai ; Google Cloud, Google Agentspace enables the agent-driven enterprise, April 9, 2025, https://cloud.google.com/blog/products/ai-machine-learning/google-agentspace-enables-the-agent-driven-enterprise
[^5]: UNITE.AI, From Generative to Agentic AI: The Shift From Content Risk to Execution Exposure, April 24, 2026, https://www.unite.ai/generative-vs-agentic-ai-execution-risk-exposure/
[^6]: UNITE.AI, The Security Vulnerabilities We Built In: AI Agents and the Problem with Obedience, June 18, 2025, https://unite.ai/the-security-vulnerabilities-we-built-in-ai-agents-and-the-problem-with-obedience
[^7]: arXiv, ActionNex: A Virtual Outage Manager for Cloud Computing, April 3, 2026, https://arxiv.org/html/2604.03512v1
[^8]: Microsoft Learn, Overview of Azure SRE Agent, March 27, 2026, https://learn.microsoft.com/en-us/azure/sre-agent/overview ; Microsoft Tech Community, Expanding the Public Preview of the Azure SRE Agent, https://techcommunity.microsoft.com/blog/appsonazureblog/expanding-the-public-preview-of-the-azure-sre-agent/4458514










