AGENTS.md Template: Queue-Based Autoscaling

Overview

This AGENTS.md template defines a queue based autoscaling workflow for AI coding agents. It supports single agent operation and multi-agent orchestration in a queue driven environment, with clear handoffs, tool governance, and human review hooks. Direct answer: it codifies the roles, rules, and interactions needed to scale resources based on queue metrics while preserving safety and auditability.

When to Use This AGENTS.md Template

When you operate a queue driven system that must scale compute resources automatically.
When you require explicit handoffs between planner, autoscaler, and health/resource managers.
When governance, auditing, and human review are required for production changes.
When you want a single source of truth for agent interactions and decisions.

Copyable AGENTS.md Template

# AGENTS.md
Project role: Queue-based autoscaling orchestrator for cloud resources in a queue driven environment.

Agent roster and responsibilities:
- Planner: Monitors queue depth, SLA, and policy; decides scaling actions and produces a plan.
- Autoscaler: Executes scaling actions against cloud platforms; ensures idempotent operations and rollback.
- HealthMonitor: Tracks queue health, worker health, and resource budgets; flags anomalies.
- ResourceManager: Provisions or releases resources; ensures proper tagging and accounting.
- DataFetcher: Gathers metrics from monitoring endpoints; updates memory.

Supervisor or orchestrator behavior:
- Orchestrator coordinates all agents, enforces timeouts, and routes plans to execution agents; logs decisions and escalates when thresholds breached.

Handoff rules between agents:
- Planner > Autoscaler for execution
- Autoscaler > HealthMonitor after action
- HealthMonitor > Planner if re evaluation is needed
- DataFetcher > Planner/Autoscaler to update context as needed

Context, memory, and source-of-truth rules:
- All state stored in a central memory store; source of truth is the system state store and execution logs; memory entries include a timestamp and policy version.

Tool access and permission rules:
- Agents may call cloud APIs with restricted permissions; secrets stored securely; no hard coded credentials; actions require appropriate approvals for sensitive ops.

Architecture rules:
- Event driven, idempotent, auditable; avoid side effects without confirmation.

File structure rules:
- Place this AGENTS.md at project root as the single source of truth for this workflow.

Data, API, or integration rules:
- Use official APIs; validate schemas; respect rate limits; log all external calls.

Validation rules:
- Pre checks before scaling; post checks after actions; verify invariants.

Security rules:
- Encrypt secrets; restrict network egress; monitor for breaches.

Testing rules:
- Unit tests for decision logic; integration tests for actions; end to end tests for production like scenarios.

Deployment rules:
- Canaries for production changes; rollback on failure or degraded health.

Human review and escalation rules:
- On SLA breach or uncertain decisions, escalate to on call engineer; manual override allowed with audit.

Failure handling and rollback rules:
- If action fails, revert to previous state; record rollback reason; alert operators.

Things Agents must not do:
- Do not scale beyond max or below min; do not perform destructive actions without checks; do not skip validation.

Recommended Agent Operating Model

The default operating model is a two-layer approach: a Planner that decides when to scale and what actions to take, and an Autoscaler that performs the actions. In a multi-agent setup, an Orchestrator coordinates Planner, Autoscaler, HealthMonitor, and DataFetcher to ensure alignment with policy and SLAs. Decision boundaries: planner sets scaling thresholds; escalation paths trigger human review for ambiguous or high-risk changes.

Recommended Project Structure

ai-workflows/queue-autoscaling/
  orchestrator/
  planner/
  autoscaler/
  monitors/
  memory/
  policies/
  configs/
  tests/

Core Operating Principles

Clear ownership and accountability for each agent role.
Idempotent actions and auditable decision logs.
Single source of truth for context and state.
Least privilege for tool access and secrets management.
Observability through metrics, traces, and robust tests.
Safe escalation paths and human review when needed.

Agent Handoff and Collaboration Rules

Planner hands off to Autoscaler with a具体 plan and time to execute.
Autoscaler reports outcomes to HealthMonitor and Memory.
HealthMonitor triggers Planner re-evaluation if SLA or health metrics drift.
DataFetcher updates context and can trigger Planner re-planning.
Domain specialists can annotate decisions via the Orchestrator with approval gates.

Tool Governance and Permission Rules

Actions to cloud resources require least privilege permissions and approval gates for production changes.
Secrets must reside in a secret store; no plaintext credentials in logs.
API calls are rate-limited and auditable; all external calls are logged.
Production changes require canary deployment and rollback paths.

Code Construction Rules

Idempotent scaling actions with deterministic outcomes.
Validate inputs against schemas before acting.
All decisions versioned; config changes require review.
Use feature flags to enable incremental rollout.
Logging includes context ids and timestamps for traceability.

Security and Production Rules

Encrypt all secrets in transit and at rest.
Limit network access to required endpoints only.
Implement monitoring and alerting for failed actions and anomalies.
Require manual approval for risky production changes.

Testing Checklist

Unit tests for Planner decision logic and Autoscaler actions.
Integration tests that simulate queue depth changes and scaling events.
End to end tests for a full run with health checks and rollbacks.
Security tests for secret handling and access controls.

Common Mistakes to Avoid

Overlapping scaling thresholds that cause oscillations.
Missing audit trails for decisions and actions.
Unauthorized access to production resources or secrets.
Silent failures due to partial retries without rollback.

FAQ

What is the purpose of this AGENTS.md Template?

To codify a queue based autoscaling workflow for AI coding agents, enabling single-agent operation or multi-agent orchestration with clear handoffs, rules, and governance.

How do agents hand off tasks in this workflow?

Planner proposes a plan; Autoscaler executes; HealthMonitor validates; DataFetcher updates context; Handoffs are explicit and logged.

Where is the context stored and how is the truth maintained?

All context resides in a central memory store and a system state store with a single source of truth; updates are timestamped and auditable.

What should developers do to customize thresholds?

Adjust policy thresholds in the Planner rules and memory policies; validate changes in a staging environment before production.

What are the security and deployment constraints?

Encrypt secrets, restrict network access, and use controlled deployment with canary checks and rollback.

Target User

Use Cases