Skip to content
·12 min read

How to Structure Your AI Coding Agent So It Stops Wasting Your Time

AI coding agents like Claude Code, Cursor, and Copilot are incredibly productive — when they're pointed in the right direction. The problem is that most developers give them a prompt, watch them write hundreds of lines of code, and then spend an hour cleaning up the mess. Here's how to fix that.

The Unstructured AI Session

Picture this: you open Claude Code and type "Build a user notification system with email alerts." The agent goes to work. Three minutes later, it's written 400 lines across 12 files. It created a new database table, added a message queue, wrote an email service, built API endpoints, and updated the frontend.

The code compiles. The tests pass. But when you look at the PR, you realize it restructured your database schema in a way that conflicts with your existing auth system, introduced a dependency on Redis that you don't use anywhere else, and made five architectural decisions you never would have approved if asked.

You spend the next two hours reverting, re-explaining, and manually guiding the agent through the correct implementation. The "3-minute magic" turned into a 3-hour mess.

This isn't the AI's fault. It's a structural problem. AI agents have no memory, no understanding of your architecture, no sense of scope boundaries, and no concept of "ask before you act." They do what you tell them — fast — without the context that a human developer would have after working on the project for a week.

The Three Things AI Agents Are Missing

1. Persistent Memory

Every time you start a new session with an AI coding agent, it starts from zero. It doesn't know your architecture. It doesn't know that you decided to use Postgres instead of MongoDB three months ago and why. It doesn't know that your API follows a specific pattern for error handling. It doesn't know that the last time it worked on this project, it was halfway through a feature.

Without persistent memory, you either re-explain the entire project context every session (which is slow and error-prone), or you let the agent guess (which leads to the problems described above).

The fix is a structured memory system — a set of files that capture your architecture, tech stack, patterns, decisions, and current work. The agent reads these at the start of every session. Your context survives across sessions without you repeating yourself.

2. Role Separation

When you ask an AI agent to "build a feature," it tries to do everything at once: design the architecture, plan the implementation, write the code, and test it — all in one pass. This is like asking a single person to be the architect, tech lead, and developer simultaneously, with no review in between.

In practice, the design phase is where most mistakes happen. If the agent makes a wrong architectural decision at the start, everything it builds on top of that decision is wrong too. And because it did everything in one pass, you don't discover the problem until you're staring at a 400-line PR.

The fix is role-based skills. Separate the AI into distinct roles — an Architect that designs, a Tech Lead that plans tasks, and a Dev Agent that implements. Each role has a defined scope and output. The Architect can't write code. The Dev Agent can't make architectural decisions. And between each role, there's an approval gate where you review the output before the next role starts.

3. Approval Gates

The most expensive mistake in AI-assisted development isn't a bug — it's a wrong direction. An AI agent can write 500 lines of correct, well-tested code that implements the wrong thing. Without checkpoints, you don't catch this until the PR is open and you've already invested significant time.

Approval gates are mandatory pauses where the agent presents its work and asks you to review before proceeding. The key insight is that a 2-minute review at the design stage is worth more than a 2-hour review at the PR stage. By the time code exists, there's sunk-cost pressure to keep it even if the direction is wrong.

Get all 16 free CLAUDE.md templates + cheat sheets

Enterprise-grade conventions for every major stack, plus Claude Code and prompt engineering guides. No account needed.

Download free

A Practical Framework

Here's the workflow we use. It's built around three phases with human approval between each:

Phase 1: Design → You approve the architecture

Phase 2: Plan → You approve the task breakdown

Phase 3: Build → Agent implements one task at a time

Phase 1 — Design. You describe the feature. The AI agent (in its Architect role) asks clarifying questions: what problem does this solve, who benefits, what's out of scope, any constraints? Then it produces a design: component impact analysis, data model changes, API changes, technical risks. You review this and approve or request changes. No code is written until the design is approved.

Phase 2 — Plan. The AI (now in its Tech Lead role) takes the approved design and breaks it into concrete tasks. Each task targets one service, touches no more than 15 files, and has a specific "done when" criteria that a reviewer can verify. You review the task breakdown, approve it, and the tasks are loaded into a backlog.

Phase 3 — Build. The AI (in its Dev Agent role) picks up one task at a time from the backlog, creates a branch, implements, tests, commits, and opens a PR. Each PR is small, focused, and reviewable. If the agent discovers something outside the task scope during implementation, it logs it as a new task in the backlog instead of implementing it immediately.

The result: instead of one massive, unreviewable PR, you get a series of small, focused PRs that each do one thing well. The architecture was reviewed before any code was written. The task breakdown was reviewed before any implementation started. And every PR maps to a specific task with clear done-when criteria.

When to Skip the Ceremony

This full workflow is for meaningful features — the kind where a wrong architectural decision costs hours of rework. For smaller things, you don't need the full ceremony:

For small/medium features, you can fast-track the process: combine the design and planning phases into a single pass, get one approval, and start implementing. The quality gates are still there, but you only stop once instead of twice.

For bug fixes and minor improvements, skip straight to the backlog. Write a task with done-when criteria and let the agent implement it. No design phase needed.

The workflow is a tool, not a religion. Use the level of ceremony that matches the risk of the change.

Results

After using this framework on real projects, three things changed dramatically:

Time-to-correct-code dropped. The total time from feature idea to merged, correct code is shorter — even though we added approval gates. The gates catch wrong-direction mistakes early, which is far cheaper than catching them in a PR review. A 2-minute design review prevents a 2-hour revert-and-redo cycle.

PR quality improved. Small, focused PRs with clear task descriptions are reviewable. You can actually understand what changed and why. No more 400-line PRs that you approve because you don't have time to fully review them.

Context loss disappeared. The persistent memory system means the agent knows your architecture, patterns, and decisions from the moment a session starts. No more re-explaining. No more "why did you use MongoDB when we use Postgres everywhere?"

Getting Started

We've packaged this framework as Archie — a set of files you drop into your project that gives your AI coding agent the memory, roles, and approval gates described in this article. It works with Claude Code today and is designed to support other agents.

Whether you use Archie or build your own version, the core principles are the same: give your AI agent persistent context, separate design from implementation, and add human checkpoints before code is written. Your future self will thank you for the hours of rework you didn't have to do.