AGENTS.md Template for LLM Application Production Architecture

Overview

The AGENTS.md Template for LLM application production architecture defines the operating context for both single-agent and multi-agent orchestration using AI coding agents. It captures roles, handoffs, memory, tool governance, and security policies to ensure predictable behavior in production.

Direct answer: This template provides a complete, copyable AGENTS.md that governs an LLM-based production workflow with clear roles, handoffs, and governance to prevent drift and ensure auditability.

When to Use This AGENTS.md Template

When you are designing an LLM-powered product that requires multiple agents collaborating across stages (planner, implementer, reviewer, researcher).
When you need a reference operating manual that can be embedded in code repositories for consistent behavior.
When tool access, memory sources, and escalation paths must be explicit and auditable.
When you want a template that scales from a single agent to multi-agent orchestration with defined handoffs.

Copyable AGENTS.md Template

# AGENTS.md

Project: LLM Production Orchestration for AI coding agents

Agent Roster:
- Planner: defines strategy, tasks, and high-level goals.
- Implementer: executes actions, calls tools, and updates memory.
- Reviewer: validates outputs, detects errors, and requests replans.
- Researcher: gathers external data, validates sources, and informs decisions.
- Domain Specialist: provides domain-specific constraints and checks.

Supervisor / Orchestrator: The orchestrator enforces coordination rules, monitors health, and triggers handoffs between agents.

Handoff Rules:
- Planner ➜ Implementer: Provide task list, context, and acceptance criteria.
- Implementer ➜ Reviewer: Deliver results with verifiable traces and memory updates.
- Researcher ➜ Planner: Return data findings with sources and confidence.
- Domain Specialist ➜ Implementer: Validate domain constraints before tool use.

Context, Memory, Source of Truth:
- Context stores current task state, goals, and decision logs.
- Memory persists relevant outputs, tool results, and references.
- Source of Truth: All claims reference primary data sources and code repositories.

Tool Access & Permissions:
- Tools: Define allowed tools and scopes.
- Secrets: Use vault or environment variables with access controls.
- Creation/Modification: Agents can only modify what they own or what is explicitly allowed.

Architecture Rules:
- All actions are versioned and auditable.
- No hidden agents; all behavior must be traceable in AGENTS.md.
- Memory entries must include timestamps and agent identifiers.

File Structure Rules:
- /ai-skills/agents-md-templates/ (planner, implementer, reviewer, researcher, domain)
- /tools/ (tool interfaces and adapters)
- /memory/ (context and memory store)
- /configs/ (settings and policies)
- /docs/ (architecture decisions and notes)

Data / API / Integration Rules:
- All external calls are logged with I/O events.
- Data contracts are defined for each API.
- Sensitive data handling follows policy: redact, encrypt, and restrict.

Validation Rules:
- Outputs must be validated against schemas.
- All decisions must include justification and source references.

Security Rules:
- Secrets never logged in plaintext.
- Principle of least privilege for all agents.
- Production endpoints require approval gates.

Testing Rules:
- Unit tests for agents, integration tests for cross-agent flows.
- Mock tools and data for CI tests.
- Rollback plan included in memory and state changes.

Deployment Rules:
- Deploy agents and tools together with feature flags.
- Database migrations must be backward compatible.
- Rollout monitoring and alerting in place.

Human Review & Escalation:
- Any uncertain decision triggers human review before production execution.
- Escalation path includes supervisor notification and audit logging.

Failure Handling & Rollback:
- On failure, revert memory state and restore last good outputs.
- If a tool or API is down, gracefully degrade and replan.

Do Not Do:
- Do not bypass memory or source-of-truth checks.
- Do not perform unsanctioned tool calls or code changes.
- Do not drift from the agreed AGENTS.md architecture.
- Do not operate in production without an explicit approval gate.

Recommended Agent Operating Model

Roles and decision boundaries: Planner designs the approach; Implementer executes actions; Reviewer ensures quality; Researcher informs data inputs; Domain Specialist enforces constraints. The Orchestrator enforces handoffs and maintains a single truth source. Escalation routes: if outputs are uncertain or violate policy, escalate to a human review stage before continuing.

Recommended Project Structure

ai-project/
  orchestrator/
    orchestrator.py
  agents/
    planner/
      planner.py
      prompts/
    implementer/
      implementer.py
      adapters/
    reviewer/
      reviewer.py
      validators/
    researcher/
      researcher.py
    domain/
      domain_specialist.py
  memory/
    context.json
  tools/
    tool_bindings/
  configs/
    policies.yaml
  data/
    inputs/
  docs/
    architecture.md

Core Operating Principles

Operate with explicit roles, ownership, and audit trails.
Handoffs are deterministic and supported by memory/context updates.
All actions are subject to tool governance and security controls.
Maintain a single source of truth for decisions and outputs.
Escalate to human review for uncertain or unsafe outcomes.

Agent Handoff and Collaboration Rules

Concrete rules for planner, implementer, reviewer, tester, researcher, and domain specialist agents:

Planner → Implementer: Provide task breakdown, acceptance criteria, context, and success metrics.
Implementer → Reviewer: Deliver outputs with reproducible steps and references.
Reviewer → Planner: Return validated results or requests for replanning with justification.
Researcher → All: Share sources with confidence levels and data quality notes.
Domain Specialist → Implementer: Validate domain constraints before external calls.

Tool Governance and Permission Rules

Only approved tools may be invoked; each tool call must be auditable.
Secrets must never be exposed in logs or memory unless encrypted.
Production endpoints require explicit approval gates and monitoring.

Code Construction Rules

Concrete implementation constraints for this workflow:

Code must include explicit input validation and output schemas.
All changes to shared memory or state must be transactional.
API calls must handle retries with backoff and clear error reporting.

Security and Production Rules

Least privilege access for all agents.
Audit logging for all critical actions and data access.
Separate development, staging, and production environments with policy gates.

Testing Checklist

Unit tests for each agent type; integration tests for cross-agent flows.
End-to-end tests from planner to final output with memory checks.
Deployment tests including rollback and disaster recovery drills.

Common Mistakes to Avoid

Assuming memory is always reliable without persistence checks.
Skipping human review for high-risk decisions.
Bypassing tool governance or secret management.
Architectural drift between agents and the template.

FAQ

What is the purpose of this AGENTS.md Template for LLM application production architecture?

The template defines a portable operating context for single-agent and multi-agent workflows, enabling auditable handoffs and governance in production.

How should I structure the agent roster and responsibilities?

Assign clear roles (Planner, Implementer, Reviewer, Researcher, Domain Specialist) with explicit tasks, inputs, outputs, and success criteria; ensure the Orchestrator enforces handoffs.

How are context, memory, and source-of-truth managed?

Context holds current state; memory persists outputs and references; all claims cite primary sources and are timestamped.

What are the tool governance and permission rules?

Only approved tools may be invoked; secrets are protected; actions are auditable and gated in production.

What should be included in testing and deployment rules?

Include unit and integration tests for agents, end-to-end tests, deployment rollouts with feature flags, and rollback plans.

Target User

Use Cases