AGENTS.md TemplatesAGENTS.md TemplateMay 26, 2026

AGENTS.md Template: Vertical Scaling Evaluation for AI coding agents

Copyable AGENTS.md template page guiding vertical scaling evaluation for AI coding agents and multi-agent orchestration.

AGENTS.md templateAI coding agentsvertical scalingvertical scaling evaluationmulti-agent orchestrationagent handoff rulestool governancehuman reviewscaling governanceoperating modelagent workflows

Target User

Developers, founders, product teams, engineering leaders

Use Cases

Vertical scaling evaluation for AI coding agents
Single-agent and multi-agent orchestration patterns
Agent governance for scaling tasks
Experiment-driven resource planning

Markdown Template

AGENTS.md Template: Vertical Scaling Evaluation for AI coding agents

# AGENTS.md

Project role: This vertical scaling evaluation project uses AI coding agents to assess whether resource upgrades yield measurable performance gains without compromising cost or reliability.

Agent roster and responsibilities:
- Planner: designs the vertical scaling experiment, success criteria, and rollout plan.
- Evaluator: runs controlled tests to measure performance, latency, and cost impact.
- Orchestrator: coordinates tasks, enforces sequencing, and handles handoffs between agents.
- Researcher: gathers historical data, benchmarks, and external references.
- Domain Specialist: ensures domain constraints and policy compliance.
- QA Tester: validates outputs, reproducibility, and stability.
- Security Auditor: checks access control, secrets handling, and risk.

Supervisor or orchestrator behavior:
- Maintains a single source of truth for the experiment (memory) and logs decisions.
- Enforces the plan, triggers tasks, and handles timeouts or failures.
- Escalates to human reviewers if safety, security, or reliability criteria are breached.

Handoff rules between agents:
- Planner -> Evaluator: share plan, metrics, data sources, and acceptance criteria.
- Evaluator -> Orchestrator: share results, confidence, and suggested rollouts.
- Orchestrator -> Domain Specialist/Researcher: request domain confirmation or additional data.
- Any handoff requires an explicit memory update and source-of-truth citation.

Context, memory, and source-of-truth rules:
- Context is stored in a dedicated experiment memory store with immutable entries for each run.
- All decisions must reference primary sources (design docs, benchmarks, and logs).
- No inference is allowed without citing sources and updating memory.

Tool access and permission rules:
- Tools available: internal metrics API, cost API, logging, and deployment controls within approved sandbox.
- No production system changes without explicit approval and a rollback plan.
- Secrets must be retrieved from a vault and never embedded in logs.

Architecture rules:
- Event-driven, with a central orchestrator and specialized agents.
- Clear boundaries between planning, evaluation, and execution.
- Results written to versioned artifacts under /experiments/vertical-scaling.

File structure rules:
- Do not create unused folders. Only include planner, evaluator, orchestrator, researcher, domain-specialist, memory, and tools under vertical-scaling-evaluation.

Data, API, or integration rules:
- Use internal APIs for metrics, cost, and feature flags.
- All data is immutable post-run; results are append-only.

Validation rules:
- All outputs must be verifiable against metrics and logs. Validation failures abort the run.

Security rules:
- Never expose secrets in code or logs. Use vaults and role-based access.

Testing rules:
- Include unit tests for each agent’s logic and integration tests for end-to-end scaling evaluation.

Deployment rules:
- CI/CD gates require review and approval for any scaling rollout.

Human review and escalation rules:
- Escalate any non-deterministic results or security risk to a human reviewer.

Failure handling and rollback rules:
- If scaling actions cause degradation, automatically rollback to previous stable state and notify owners.

Things Agents must not do:
- Do not modify production configuration without approval.
- Do not log PII.
- Do not perform unsanctioned autonomous deployments.

Overview

Direct answer: This AGENTS.md Template formalizes a vertical scaling evaluation workflow for AI coding agents, enabling single-agent and multi-agent orchestration with clearly defined roles, handoffs, memory, and governance to assess resource upgrades safely and reproducibly.

The AGENTS.md template provides the project-level operating context for evaluating vertical scaling in AI coding agent workloads—defining decision boundaries, data sources, and escalation paths for scaling CPU, memory, accelerators, or related infrastructure while keeping cost and reliability in check.

When to Use This AGENTS.md Template

When you need a repeatable approach to assess whether vertical scaling improves agent performance or stability.
When multi-agent orchestration is required to coordinate planning, evaluation, and execution of scaling decisions.
If you require explicit handoffs, memory of experiments, and a single source of truth for scaling decisions.
When governance, security, and auditable changes are mandatory for scaling actions.

Copyable AGENTS.md Template

# AGENTS.md

Project role: This vertical scaling evaluation project uses AI coding agents to assess whether resource upgrades yield measurable performance gains without compromising cost or reliability.

Agent roster and responsibilities:
- Planner: designs the vertical scaling experiment, success criteria, and rollout plan.
- Evaluator: runs controlled tests to measure performance, latency, and cost impact.
- Orchestrator: coordinates tasks, enforces sequencing, and handles handoffs between agents.
- Researcher: gathers historical data, benchmarks, and external references.
- Domain Specialist: ensures domain constraints and policy compliance.
- QA Tester: validates outputs, reproducibility, and stability.
- Security Auditor: checks access control, secrets handling, and risk.

Supervisor or orchestrator behavior:
- Maintains a single source of truth for the experiment (memory) and logs decisions.
- Enforces the plan, triggers tasks, and handles timeouts or failures.
- Escalates to human reviewers if safety, security, or reliability criteria are breached.

Handoff rules between agents:
- Planner -> Evaluator: share plan, metrics, data sources, and acceptance criteria.
- Evaluator -> Orchestrator: share results, confidence, and suggested rollouts.
- Orchestrator -> Domain Specialist/Researcher: request domain confirmation or additional data.
- Any handoff requires an explicit memory update and source-of-truth citation.

Context, memory, and source-of-truth rules:
- Context is stored in a dedicated experiment memory store with immutable entries for each run.
- All decisions must reference primary sources (design docs, benchmarks, and logs).
- No inference is allowed without citing sources and updating memory.

Tool access and permission rules:
- Tools available: internal metrics API, cost API, logging, and deployment controls within approved sandbox.
- No production system changes without explicit approval and a rollback plan.
- Secrets must be retrieved from a vault and never embedded in logs.

Architecture rules:
- Event-driven, with a central orchestrator and specialized agents.
- Clear boundaries between planning, evaluation, and execution.
- Results written to versioned artifacts under /experiments/vertical-scaling.

File structure rules:
- Do not create unused folders. Only include planner, evaluator, orchestrator, researcher, domain-specialist, memory, and tools under vertical-scaling-evaluation.

Data, API, or integration rules:
- Use internal APIs for metrics, cost, and feature flags.
- All data is immutable post-run; results are append-only.

Validation rules:
- All outputs must be verifiable against metrics and logs. Validation failures abort the run.

Security rules:
- Never expose secrets in code or logs. Use vaults and role-based access.

Testing rules:
- Include unit tests for each agent’s logic and integration tests for end-to-end scaling evaluation.

Deployment rules:
- CI/CD gates require review and approval for any scaling rollout.

Human review and escalation rules:
- Escalate any non-deterministic results or security risk to a human reviewer.

Failure handling and rollback rules:
- If scaling actions cause degradation, automatically rollback to previous stable state and notify owners.

Things Agents must not do:
- Do not modify production configuration without approval.
- Do not log PII.
- Do not perform unsanctioned autonomous deployments.

Recommended Agent Operating Model

The operating model defines the Planner as the decision-maker for the scaling plan, with Evaluator validating performance, Orchestrator enforcing sequencing and handoffs, and Domain Specialist and Researcher providing context and governance. Escalation paths exist for anomalies or policy violations.

Recommended Project Structure

/
  vertical-scaling-evaluation/
    orchestrator/
    planner/
    evaluator/
    researcher/
    domain-specialist/
    tester/
    memory/
    tools/
    config/
    docs/

Core Operating Principles

Explicit separation of planning, evaluation, and execution.
Deterministic handoffs with memory updates and source-of-truth citations.
Least privilege for tool access and secure secrets handling.
Idempotent actions and auditable decisions.
Observability and reproducibility for each scaling evaluation.

Agent Handoff and Collaboration Rules

Planner communicates plan, data sources, and acceptance criteria to Evaluator.
Evaluator reports results to Orchestrator with confidence levels.
Orchestrator coordinates cross-agent tasks and ensures alignment with policies.
Researcher and Domain Specialist provide input when constraints require it.

Tool Governance and Permission Rules

Commands and API calls must be approved by the orchestrator and logged.
All edits to files and deployment configurations require review gates.
Secrets are retrieved from a vault; do not store in code or logs.

Code Construction Rules

All code and configs are reviewed against the scaling policy and safety constraints.
Use versioned artifacts and maintain change history for scaling decisions.

Security and Production Rules

Cannot perform production changes without approvals and rollback plan.
Audit logs are immutable and retained per policy.

Testing Checklist

Unit tests for agent logic.
Integration tests for end-to-end scaling evaluation.
Smoke tests after any scaling event.

Common Mistakes to Avoid

Skipping formal handoffs or source-of-truth updates.
Allowing unsupervised production changes.
Confusing correlation with causation in performance results.

FAQ

What is the purpose of this AGENTS.md Template for vertical scaling evaluation?

To codify roles, handoffs, and governance for evaluating and applying vertical scaling to AI coding agents, ensuring safe, auditable decisions within single-agent or multi-agent orchestration.

Who are the typical agents in the roster and what do they do?

Planner designs experiments; Evaluator measures performance; Orchestrator coordinates tasks; Researcher gathers data; Domain Specialist ensures domain constraints; QA Tester validates outputs; Security Auditor checks compliance.

How are handoffs between agents managed?

Handoffs follow a defined sequence: Planner -> Evaluator -> Orchestrator, with explicit memory and source-of-truth updates, and escalation if success criteria unmet.

What are the key safety and governance constraints?

No production changes without approvals; secrets handled securely; actions run in staging; logs scrubbed; and every change is auditable.

How is success validated and what happens on failure?

Evaluation results must meet predefined metrics; on success provide evidence and a plan to scale; on failure, rollback changes, notify stakeholders, and escalate if needed.