AGENTS.md Template for Autoscaling Architecture

Overview

AGENTS.md Template for Autoscaling Architecture defines how AI coding agents govern autoscaling decisions using a formalized operating manual. It covers both single-agent and multi-agent orchestration, detailing roles, constraints, memory, and source-of-truth to ensure traceable scale actions.

When to Use This AGENTS.md Template

When designing an autoscaling solution driven by AI coding agents.
When needing a repeatable, auditable agent workflow for scale decisions across services.
When establishing governance, handoffs, and escalation paths that work across teams.

Copyable AGENTS.md Template

# AGENTS.md
# Autoscale Architecture AGENTS.md

Project role: Autoscale Policy and Orchestration Agent

Agent roster and responsibilities:
- Planner: defines scale policies, thresholds, and policy packs for services.
- Implementer: applies scale changes to cloud resources or container orchestrators according to approved policies.
- Reviewer: validates planned changes against SLAs and budgets in a staging context.
- Tester: simulates traffic and validates that scaling produces expected performance.
- Researcher: collects metrics, experiments, and evidence to inform policy updates.
- Domain Specialist: ensures domain-specific constraints (SLAs, QoS) are respected.

Supervisor or orchestrator behavior:
- The Orchestrator continuously monitors metrics, enforces constraints, and assigns tasks to agents.
- Decisions must be auditable with traceable inputs and outputs.
- Orchestrator can pause or rollback changes if a policy is unsafe or violates budgets.

Handoff rules between agents:
- Planner to Implementer: only after an approved policy pack is created.
- Implementer to Reviewer: after applying changes in staging, before production.
- Reviewer to Orchestrator: badge approval and move to deployment if policy is valid.
- Orchestrator to Incident Commander: escalate only on production incidents or policy violations.

Context, memory, and source-of-truth rules:
- Maintain a centralized policy store and a facts database for metrics, configs, and incidents.
- Use canonical sources: monitoring system, config repo, incident logs.
- Do not rely on ephemeral memory; always fetch latest values before decisions.

Tool access and permission rules:
- Access to autoscale configs and resource controls must follow least-privilege and be auditable.
- Secrets live in a vault; do not embed credentials in code or logs.
- Production changes require orchestrator approval gates and human review when needed.

Architecture rules:
- Event-driven, idempotent, and compensating actions on failure.
- Avoid race conditions: use transactional updates and locks where needed.
- All changes should be versioned and traceable.

File structure rules:
- Organize by domain: policies/, configs/, monitors/, and workflows/.
- Do not duplicate policy definitions across files.

Data, API, or integration rules when relevant:
- Use metrics APIs (e.g., Prometheus, CloudWatch) and config APIs to fetch current state.
- All calls should be logged and replayable for audits.

Validation rules:
- Validate against predefined SLAs, budgets, and error budgets before applying any scale.
- Run dry-runs and canary tests where feasible.

Security rules:
- Least-privilege access, no secrets in code, and secrets rotated regularly.
- Production changes require explicit approval gates.

Testing rules:
- Unit tests for policy logic; integration tests with synthetic load; end-to-end tests in staging.

Deployment rules:
- Feature-flagged rollouts; canary or blue-green deployments; rollback paths clearly defined.

Human review and escalation rules:
- Human review required for high-risk scale actions or policy changes.
- Escalate to incident commander in production incidents.

Failure handling and rollback rules:
- If a scale action fails, revert to the previous scale state and alert.
- Maintain an immutable log of all remediation steps.

Things Agents must not do:
- Do not bypass approval gates or alter policies without consent.
- Do not modify production code outside approved workflows.
- Do not ignore safety constraints or budgets.

Recommended Agent Operating Model

The agent operating model defines role boundaries, decision rights, and escalation paths for autoscale orchestration. Roles collaborate through a clear planner-implementer-reviewer-tester loop with domain specialists providing constraints. The model emphasizes auditable decisions, canary-style deployments, and mandatory human review for high-risk actions.

Recommended Project Structure

project/
└── autoscale/
    ├── agents/
    │   ├── planner/
    │   ├── implementer/
    │   ├── reviewer/
    │   ├── tester/
    │   ├── researcher/
    │   └── domain-specialist/
    ├── orchestrator/
    ├── policies/
    ├── configs/
    │   ├── autoscale-policy.yaml
    │   └── alerts.yaml
    ├── data/
    │   └── metrics/
    ├── scripts/
    ├── tests/
    │   └── integration/
    └── docs/

Core Operating Principles

Operate with explicit ownership and auditable decisions.
Decisions are driven by objective metrics and budgets.
Keep policies versioned and traceable.
Prefer safe, incremental changes with rollback paths.

Agent Handoff and Collaboration Rules

Planner communicates policy intent; Implementer performs policy changes with traceable inputs.
Reviewer validates production readiness before deployment.
Orchestrator enforces constraints and routes tasks with clear ownership.
Researchers feed data to update policies; Domain Specialists validate domain constraints.

Tool Governance and Permission Rules

Least-privilege access for all tools; secrets in a vault.
All API calls and config updates are auditable and reversible.
Deployment gates require approvals for high-risk changes.

Code Construction Rules

All scale policies are declarative and idempotent.
Avoid side effects in policy evaluation; use dry-run modes for experimentation.
Document all assumptions and constraints in policy files.

Security and Production Rules

Encrypt data in transit and at rest; rotate credentials regularly.
Audit trails for all scaling decisions; protect against data leakage.
Production changes require human review and sign-off when budgets exceed thresholds.

Testing Checklist

Unit tests for policy logic; integration tests with synthetic load; end-to-end tests in staging.
Canary testing and rollback validation for production rollouts.
Automated checks for drift between policy and implementation.

Common Mistakes to Avoid

Ambiguity in ownership or decision boundaries.
Policy drift between what is implemented and what is documented.
Insufficient testing under realistic traffic conditions.

FAQ

What is the purpose of this AGENTS.md Template for autoscaling architecture?

This template defines the operating context, roles, and rules for AI coding agents orchestrating autoscaling in a multi-agent workflow.

How do agents hand off tasks in autoscaling decisions?

Handoffs are governed by clear state context, ownership delineations, and escalation paths to ensure each decision is traceable and auditable.

What are the security and production considerations?

All tool access and API calls must follow least-privilege principles, with secrets rotated, and production changes require explicit approval gates and human review.

How should success be validated in autoscaling experiments?

Validation relies on predefined metrics, controlled rollouts, rollback plans, and post-incident reviews to ensure decisions align with SLAs and budgets.

What are common pitfalls in autoscaling agent workflows?

Ambiguity in ownership, drift between policies and implementation, and insufficient testing in real traffic can lead to unstable scaling.

Target User

Use Cases