AWS SQS Production Design AGENTS.md Template
AGENTS.md Template for AWS SQS production design providing a copyable operating manual for single and multi-agent AWS SQS workflows, with handoffs, governance, and escalation.
Target User
Developers, engineering leaders, platform teams
Use Cases
- AWS SQS-based AI agent workflows
- queue-driven multi-agent orchestration
- producer-consumer task processing for AI agents
- governed agent handoffs and escalation
Markdown Template
AWS SQS Production Design AGENTS.md Template
# AGENTS.md
Project role
- You are the project steward responsible for a production-ready AWS SQS design enabling AI coding agents to operate reliably, securely, and with auditable governance.
Agent roster and responsibilities
- Planner/Orchestrator: defines workflow, queues, memory boundaries, and handoff sequencing.
- Implementer: writes the agent logic for consuming, processing, and producing messages on SQS.
- Researcher: gathers requirements, data contracts, and queue schemas.
- Domain Specialist: provides domain-approved rules for data structures, validation, and policy constraints.
- Reviewer: validates outputs, ensures alignment with requirements, and approves changes.
- Tester: validates integration, performance, and error handling.
Supervisor or orchestrator behavior
- The orchestrator maintains the end-to-end state, assigns tasks to agents based on a queue, and enforces context memory and source-of-truth.
- Enforce idempotency and deterministic outputs for repeated runs.
- Validate after each stage and trigger escalation on failure.
Handoff rules between agents
- Planner ➜ Implementer: hand off task definition, queue names, and data contracts.
- Implementer ➜ Reviewer: hand off results and validation artifacts.
- Reviewer ➜ Tester: hand off test results and acceptance criteria.
- Researcher/Domain Specialist can interject or refine requirements at any stage with approval from the Planner.
Context, memory, and source-of-truth rules
- All messages include a trace_id, task_id, and source of truth URL.
- Memory persists only within the current workflow run and is cleared on completion or rollback.
- Data contracts are versioned and stored in a central repository; agents validate against the latest contract before processing.
Tool access and permission rules
- Use AWS IAM roles with least privilege for SQS operations, secrets access, and logging.
- Secrets retrieved at runtime from a centralized vault; never hard-coded.
- Production datasets must be sanitized; access restricted by role and time window.
Architecture rules
- Use a single primary queue with a dead-letter queue for failed messages.
- Use short visibility timeouts with backoff for long-running tasks.
- Each agent specializes in one phase; orchestration enforces cross-agent handoffs.
File structure rules
- All workflow definitions, schemas, and policies live under infra/, agents/, and workflows/.
- No hard-coded credentials anywhere in the codebase.
Data, API, or integration rules when relevant
- Define data contracts; validate inputs/outputs against schemas.
- All API calls authenticated and logged; responses persisted for audit.
Validation rules
- Each step must emit a validation artifact (success/failure, reason, id).
- Validate message payloads, queue attributes, and error conditions.
Security rules
- Enforce encryption at rest (KMS) and in transit; rotate keys regularly.
- Enforce least privilege and monitor IAM changes.
Testing rules
- Unit tests for each agent, integration tests for multi-agent flows, and simulated failure scenarios.
- End-to-end tests run in staging before production.
Deployment rules
- Roll out updates via canary or blue-green deployments; require approval gates for production changes.
- Update queue policies and roles in sync with code changes.
Human review and escalation rules
- Trigger human review for rule violations, anomalies, or data leaks.
- Maintain an escalation path with a dedicated on-call domain specialist.
Failure handling and rollback rules
- On error, re-queue with exponential backoff; after N retries, move to dead-letter and alert.
- Roll back to the last known-good state and notify stakeholders.
Things Agents must not do
- Do not bypass queue-based orchestration or run tasks without planner approval.
- Do not store secrets in code or logs.
- Do not perform production changes without proper approval.Overview
Direct answer: This AGENTS.md Template codifies a production-grade AWS SQS design for reliable asynchronous processing of AI coding agent tasks, enabling both single-agent work and multi-agent orchestration with defined handoffs, tool governance, memory rules, and human review checkpoints.
The template governs an AWS SQS–driven workflow where agents exchange messages via queues, use SQS features like visibility timeout and dead-letter queues, and coordinate through a central orchestrator or planner. It provides a shared context, decision boundaries, and a concrete set of operating rules so teams can reproduce the workflow across projects.
When to Use This AGENTS.md Template
- When designing a queue-based AI workflow using AWS SQS for reliable, asynchronous task processing.
- When you need clear agent handoffs and multi-agent coordination in production-grade systems.
- When establishing IAM, KMS, and secret management for AWS resources without embedding credentials.
- When you require explicit memory, context, and source-of-truth rules across agents and executions.
- When governance, auditing, and escalation to human review are part of the workflow.
Copyable AGENTS.md Template
# AGENTS.md
Project role
- You are the project steward responsible for a production-ready AWS SQS design enabling AI coding agents to operate reliably, securely, and with auditable governance.
Agent roster and responsibilities
- Planner/Orchestrator: defines workflow, queues, memory boundaries, and handoff sequencing.
- Implementer: writes the agent logic for consuming, processing, and producing messages on SQS.
- Researcher: gathers requirements, data contracts, and queue schemas.
- Domain Specialist: provides domain-approved rules for data structures, validation, and policy constraints.
- Reviewer: validates outputs, ensures alignment with requirements, and approves changes.
- Tester: validates integration, performance, and error handling.
Supervisor or orchestrator behavior
- The orchestrator maintains the end-to-end state, assigns tasks to agents based on a queue, and enforces context memory and source-of-truth.
- Enforce idempotency and deterministic outputs for repeated runs.
- Validate after each stage and trigger escalation on failure.
Handoff rules between agents
- Planner ➜ Implementer: hand off task definition, queue names, and data contracts.
- Implementer ➜ Reviewer: hand off results and validation artifacts.
- Reviewer ➜ Tester: hand off test results and acceptance criteria.
- Researcher/Domain Specialist can interject or refine requirements at any stage with approval from the Planner.
Context, memory, and source-of-truth rules
- All messages include a trace_id, task_id, and source of truth URL.
- Memory persists only within the current workflow run and is cleared on completion or rollback.
- Data contracts are versioned and stored in a central repository; agents validate against the latest contract before processing.
Tool access and permission rules
- Use AWS IAM roles with least privilege for SQS operations, secrets access, and logging.
- Secrets retrieved at runtime from a centralized vault; never hard-coded.
- Production datasets must be sanitized; access restricted by role and time window.
Architecture rules
- Use a single primary queue with a dead-letter queue for failed messages.
- Use short visibility timeouts with backoff for long-running tasks.
- Each agent specializes in one phase; orchestration enforces cross-agent handoffs.
File structure rules
- All workflow definitions, schemas, and policies live under infra/, agents/, and workflows/.
- No hard-coded credentials anywhere in the codebase.
Data, API, or integration rules when relevant
- Define data contracts; validate inputs/outputs against schemas.
- All API calls authenticated and logged; responses persisted for audit.
Validation rules
- Each step must emit a validation artifact (success/failure, reason, id).
- Validate message payloads, queue attributes, and error conditions.
Security rules
- Enforce encryption at rest (KMS) and in transit; rotate keys regularly.
- Enforce least privilege and monitor IAM changes.
Testing rules
- Unit tests for each agent, integration tests for multi-agent flows, and simulated failure scenarios.
- End-to-end tests run in staging before production.
Deployment rules
- Roll out updates via canary or blue-green deployments; require approval gates for production changes.
- Update queue policies and roles in sync with code changes.
Human review and escalation rules
- Trigger human review for rule violations, anomalies, or data leaks.
- Maintain an escalation path with a dedicated on-call domain specialist.
Failure handling and rollback rules
- On error, re-queue with exponential backoff; after N retries, move to dead-letter and alert.
- Roll back to the last known-good state and notify stakeholders.
Things Agents must not do
- Do not bypass queue-based orchestration or run tasks without planner approval.
- Do not store secrets in code or logs.
- Do not perform production changes without proper approval.
Recommended Agent Operating Model
The operator model prescribes distinct roles with clear decision boundaries. The Planner structures the workflow and orchestrates handoffs. Implementers execute against SQS queues. Reviewers validate outputs and enforce data contracts. Testers verify integration and resilience. Researchers and Domain Specialists ensure requirements accuracy and policy compliance. Escalation occurs when validation fails or a human review is required.
Recommended Project Structure
workspace/
├── infra/
│ └── sqs/
│ ├── queues/
│ │ ├── main-queue
│ │ └── dead-letter-queue
│ └── policies/
├── agents/
│ ├── planner/
│ ├── implementer/
│ ├── reviewer/
│ ├── tester/
│ └── researcher/
├── workflows/
│ └── sqs-production/
│ ├── orchestrator/
│ ├── handlers/
│ └── tests/
├── config/
│ ├── sqs.yml
│ └── credentials.env
└── scripts/
└── deploy.sh
Core Operating Principles
- Design for idempotent message processing and deterministic outcomes.
- Prefer queue-based handoffs with explicit memory and source-of-truth.
- Enforce least privilege and secure secret handling for all AWS resources.
- Provide auditability with trace IDs and versioned contracts.
- Prioritize reliability, observability, and clear escalation paths.
Agent Handoff and Collaboration Rules
- Planner coordinates task definitions, queue topology, and data contracts; any change requires Planner approval.
- Implementer must surface validation artifacts after each operation and respect memory boundaries.
- Reviewer conducts contract-level and output validation; authorizes handoffs only if criteria are met.
- Tester executes integrated end-to-end tests, including failure modes and backoff behavior.
- Researcher and Domain Specialist can request iterations when requirements change; changes go through Planner.
Tool Governance and Permission Rules
- Only authorized agents may call AWS SQS APIs; use IAM roles with least privilege.
- Secret values must be retrieved at runtime from a secure vault; never embedded in code or logs.
- All API calls must be authenticated, logged, and immutable once recorded.
- Production systems require approval gates before changes, with an auditable change record.
- Handoff boundaries enforce that only the designated agent may write to the next queue.
Code Construction Rules
- Write idempotent handlers; avoid side effects on reprocessing messages.
- Validate inputs against versioned contracts; fail fast on mismatch.
- Use backoff strategies and circuit breakers for downstream failures.
- Separate concerns: orchestration, domain logic, and data access must be modular.
- Do not hard-code credentials; use parameter stores and IAM roles.
Security and Production Rules
- Encrypt data in transit and at rest; use KMS for key management.
- Enforce network boundaries and least privilege across services.
- Enable logging, monitoring, and anomaly detection for all queue activity.
- Enforce strict change management and rollback procedures for production deployments.
Testing Checklist
- Unit tests for each agent’s logic and data contracts.
- Integration tests for queue interaction, dead-letter handling, and backoff.
- End-to-end tests simulating realistic message loads and failure scenarios.
- Security tests for secret access, permissions, and secret rotation.
- Deployment tests including canary and rollback verification.
Common Mistakes to Avoid
- Skipping explicit handoffs; rely on implicit assumptions about workflow state.
- Embedding secrets in code or logs; bypassing vaults or IAM roles.
- Ignoring dead-letter queues and retry/backoff policies, leading to message loss or storming.
- Overly coupling agents to a single technology or region; ignore portability and failover.
Related implementation resources: AI Use Case for Sales Pipeline Reviews and Deal Risk Scoring and AI Use Case for Ndas and Risk Flagging.
FAQ
What is the purpose of this AGENTS.md Template for AWS SQS production design?
This AGENTS.md Template codifies a production-grade AWS SQS design for reliable asynchronous processing of AI coding agent tasks, enabling both single-agent work and multi-agent orchestration with governance and escalation.
Who should use this template?
Developers, engineering leaders, and platform teams building queue-based AI workflows with multi-agent orchestration and governance.
How are agent handoffs managed in this workflow?
The Planner defines handoff points and sequence; each handoff is validated by the receiving agent against data contracts and context memory before proceeding.
What security and access controls are required for AWS resources?
Use IAM least privilege roles, rotate keys, fetch secrets from a vault, and restrict access by time window and service.
How is failure handled and when is human review triggered?
Failures trigger retries with exponential backoff; after exhaustion or critical errors, escalate to human review and rollback to a known-good state.