LLM Guardrails Architecture AGENTS.md Template

Overview

Direct answer: This AGENTS.md Template defines the LLM guardrails architecture for AI coding agents, enabling both single-agent and multi-agent orchestration with explicit roles, rules, and escalation paths.

Overview: The template governs an LLM guardrails workflow where specialized agents enforce policies, manage tool access, validate outputs, and escalate to human review when risk is detected.

When to Use This AGENTS.md Template

Before building production guardrails for AI coding agents or multi-agent orchestration.
When establishing a repeatable operating model that scales across teams and projects.
To standardize collaboration patterns between planners, implementers, reviewers, testers, researchers, and domain specialists.

Copyable AGENTS.md Template

# AGENTS.md

Project role: LLM Guardrails Architecture Lead

Agent roster and responsibilities:
- Orchestrator: coordinates all agents, enforces policy, triggers handoffs, and maintains memory of guardrails state.
- GuardrailPolicyAgent: evaluates prompts and responses against guardrail policies, returns policy status and suggested actions.
- ToolGovernanceAgent: validates tool usage, enforces access controls, secrets Handling, and auditing.
- MemoryContextAgent: stores shared context, sources-of-truth, and memory for conversation continuity with a defined retention policy.
- ReviewAgent: performs human reviews when risk thresholds are exceeded; records decisions.
- ResearchAgent: sources external guardrail references, updates policies.

Supervisor or orchestrator behavior:
- The Orchestrator: maintains a canonical plan, delegates tasks, aggregates results, and decides if escalation is required. It ensures idempotency and reproducibility by anchoring decisions to a memory store and policy checks.

Handoff rules between agents:
- Planner asks GuardrailPolicyAgent for policy validation before implementation.
- Implementer executes with guardrails; if failure or ambiguity, hand off to ReviewAgent or ResearchAgent as needed.
- Reviewer approves or requests modification; results returned to Orchestrator.

Context, memory, and source-of-truth rules:
- Central memory store with defined schema for context, sources-of-truth include policy documents, tool capability matrix, data provenance.
- All tasks must cite sources used; memory writes are versioned.

Tool access and permission rules:
- Tools: code execution, API calls, data access; access controlled by ToolGovernanceAgent with least privilege.
- Secrets are stored in a vault; Implementers cannot log raw secrets; rotation required.

Architecture rules:
- Modular, microservice-like components; event-driven workflow; guardrail checks are stateless between steps; memory is the single source of truth.

File structure rules:
- place files under policies/, agents/, data/, tests/, docs/; avoid unrelated files.

Data, API, or integration rules when relevant:
- Data handling follows privacy guidelines; rate limits; API keys stored securely.

Validation rules:
- Each output must pass policy checks, tool safety checks, memory traceability.

Security rules:
- Secrets handling, encryption in transit and at rest, rotation, audit trails.

Testing rules:
- Unit tests for each agent; integration tests for the orchestrated flow; end-to-end simulation with synthetic data.

Deployment rules:
- Staged deployment, feature flags, canary guardrails.

Human review and escalation rules:
- Triggered when risk score exceeds threshold; log decisions; provide rationale.

Failure handling and rollback rules:
- If a step fails, rollback to last known-good memory state; retry with backoff; escalation.

Things Agents must not do:
- Do not bypass policy, do not access secrets outside vault, do not modify production data without approval, do not retain secrets in memory.

Recommended Agent Operating Model

The operating model defines clear roles, decision boundaries, and escalation paths for LLM guardrails orchestration.

Orchestrator provides plan and enforces policy; acts as single source of truth.
GuardrailPolicyAgent enforces guardrail rules and reports violation status with rationale.
ToolGovernanceAgent enforces tool usage constraints and secrets handling.
MemoryContextAgent manages memory scope, sources-of-truth, and provenance.
ReviewAgent handles human-in-the-loop decisions; documents outcomes.
ResearchAgent keeps guardrail references up-to-date and triggers policy refreshes.

Recommended Project Structure

project/
├── policies/
│   └── guardrails.yaml
├── agents/
│   ├── orchestrator/
│   │   └── orchestrator.py
│   ├── guardrail_policy_agent/
│   ├── tool_governance_agent/
│   ├── memory_context_agent/
│   ├── review_agent/
│   └── research_agent/
├── data/
│   └── sources/
├── tests/
│   ├── unit/
│   └── integration/
├── docs/
└── README.md

Core Operating Principles

Policy-first enforcement: decisions must be justified by explicit guardrail policies.
Idempotent and auditable actions: avoid side effects and maintain traceability.
Single source of truth: all decisions reference policy documents and memory provenance.
Least privilege across agents and tools.
Observability: structured logging, metrics, and alerts for guardrail health.
Human-in-the-loop when risk exceeds thresholds.

Agent Handoff and Collaboration Rules

Planner (Orchestrator) creates plan and assigns tasks to GuardrailPolicyAgent before any implementation.
Implementer executes guarded actions with memory anchors and policy validation.
Reviewer validates outputs and approves or requests changes; escalates to ResearchAgent if sources are outdated.
Tester runs unit/integration tests; communicates results to Orchestrator and ReviewAgent.
Researcher ensures policy references are current; Domain Specialist validates domain-specific constraints.

Tool Governance and Permission Rules

All tool access is mediated by ToolGovernanceAgent with least-privilege permissions.
Secrets are stored in a centralized vault; no raw secrets in logs or memory.
API calls and tool invocations are sandboxed; all actions are auditable.
Production tool changes require explicit approval gates and release checks.

Code Construction Rules

Follow guardrails-driven design: explicit policy checks before execution.
Avoid hard-coded secrets; use secret management services.
Keep logic modular and testable; document decisions in memory and policy references.
Validate inputs and outputs at every step; fail fast on violations.
Do not bypass rate limits or access controls.

Security and Production Rules

Encrypt data in transit and at rest; rotate credentials regularly.
Audit all guardrail decisions and memory writes; retain logs per policy.
Use feature flags for guardrail rollouts; support canary deployments.
Disable dangerous actions by default; require escalation for production changes.

Testing Checklist

Unit tests for each agent policy and action.
Integration tests validating multi-agent handoffs and memory consistency.
End-to-end tests with synthetic data simulating real guardrail scenarios.
Security tests for secret handling and access controls.

Common Mistakes to Avoid

Skipping explicit policy checks before execution.
Allowing memory drift without provenance updates.
Hard-coding secrets or bypassing vaults.
Neglecting escalation paths for high-risk prompts.
Overcomplicating the orchestration without clear decision boundaries.

FAQ

What is an AGENTS.md Template for LLM guardrails?

An AGENTS.md Template provides a copyable operating manual for guardrail-driven AI coding agents, detailing roles, handoffs, restrictions, and escalation patterns for single-agent and multi-agent orchestration.

How does multi-agent orchestration work in this template?

The Orchestrator delegates tasks to specialized guardrail and governance agents, enforces policy, aggregates results, and triggers handoffs to maintain guardrail integrity and auditable provenance.

How are tool permissions managed?

Tool permissions are enforced by the ToolGovernanceAgent with least-privilege access, secrets stored in a vault, and all actions audited.

When should human review be triggered?

Human review is triggered when a guardrail policy violation or risk score exceeds defined thresholds, or when policy references are outdated and require validation.

How do I customize for a new guardrail policy?

Start by defining or updating the guardrails.yaml policy, adjust memory provenance rules, and update the orchestrator and policy agent logic to enforce the new policy.

Target User

Use Cases