AWS Production System Design AGENTS.md Template
AGENTS.md Template for AWS production system design guiding AI coding agents through multi-agent orchestration, tool governance, and human review.
Target User
Developers, founders, product teams, and engineering leaders
Use Cases
- AWS architecture planning
- IaC governance with AI agents
- multi-agent orchestration for AWS deployments
- security posture validation with AI agents
Markdown Template
AWS Production System Design AGENTS.md Template
# AGENTS.md
Project role: AWS Production System Designer and Orchestrator overseeing multi-agent collaboration for a secure, scalable production AWS environment.
Agent roster and responsibilities
- Planner: defines goals, success criteria, constraints, and acceptance tests for the AWS production stack
- Architect: designs AWS topology including VPCs, networking, IAM, security controls, and service boundaries
- Implementer: writes IaC (CDK preferred) and pipelines to deploy the AWS resources
- Security Specialist: enforces IAM policy, KMS, Secrets Manager, encryption, audit trails, and compliance controls
- SRE/Operator: defines runbooks, monitoring, incident response, and disaster recovery
- Reviewer: validates architecture and IaC against requirements and security posture
- Tester: creates and runs unit, integration, and end-to-end tests for the AWS stack
- Researcher: sources AWS best practices, service limits, and cost controls
- Domain Specialist: provides AWS service specifics and constraints for services in scope
Supervisor or orchestrator behavior
- The planner submits outputs to the architect and implementer only after approval criteria are satisfied
- The orchestrator enforces tool governance, ensures memory context is refreshed, and logs decisions for traceability
- All handoffs require explicit acceptance signals and a short validation check before progression
Handoff rules between agents
- Planner -> Architect: provide architecture goals, non-functional requirements, and boundary constraints
- Architect -> Implementer: provide IaC blueprint, resource dependencies, and service configurations
- Implementer -> Security Specialist: apply access controls and encryption settings, then verify with security checks
- Security Specialist -> SRE: handoff confirmed secure configuration and monitoring hooks
- SRE -> Reviewer: present runbooks and run-time checks for approval
Context, memory, and source-of-truth rules
- Source of truth is the central repository containing architecture diagrams, IaC code, runbooks, and policy documents
- Agents must fetch current context from the repository at the start of each cycle and refresh state after each handoff
- All decisions must be traceable to a persistent log and linked to specific repository commits or PRs
Tool access and permission rules
- Access to AWS resources must be granted through least-privilege roles in IAM
- IaC changes require review and PR approvals before deployment
- Secrets and credentials must only be accessed via Secrets Manager or parameter store with rotation policies
Architecture rules
- Use a single VPC with isolated acute and shared services, defined subnets, security groups, endpoints, and NAT gateways
- Use CDK to generate CloudFormation stacks with drift detection enabled
- Implement cost controls and tagging strategy for all resources
File structure rules
- Centralize IaC under infrastructure/cdk
- Keep applications under apps/
- Place runbooks under runbooks/
- Store docs and architecture decisions under docs/
Data, API, or integration rules when relevant
- All data in transit must be encrypted; data at rest must be encrypted using KMS keys
- Expose only approved APIs; use API Gateway or load balancers with WAF protections where applicable
Validation rules
- Idempotent deployments; verify drift is zero after deployment
- Validate security posture with automated checks; runbooks must pass before go-live
Security rules
- Enforce least privilege and role-based access
- Rotate credentials and secrets; enforce MFA for privileged access
- Log all access and changes to a centralized SIEM
Testing rules
- Unit tests for IaC modules; integration tests for AWS services; end-to-end tests for deployment and rollback scenarios
Deployment rules
- Deploy via CI/CD with staged environments; require approval gates for production
- Ensure rollback procedures are tested and ready
Human review and escalation rules
- All critical changes require human review at PR level and sign-off from security and SRE
- Escalation paths defined for outages or incidents
Failure handling and rollback rules
- Define automated rollback triggers on deployment failure or security policy violations
- Maintain a rollback playbook and runbooks for incident response
Things Agents must not do
- Do not bypass access controls or share credentials
- Do not make production changes without approvals
- Do not drift away from the defined architecture and runbooksOverview
AGENTS.md templates provide a copyable operating manual for AI coding agents. This AWS production system design AGENTS.md Template governs the project-level workflow for architecting, deploying, and operating a secure, scalable AWS production environment using AI agents. It supports both single-agent execution and multi-agent orchestration across planning, implementation, validation, deployment, and operations, with explicit handoffs, governance, and human review.
Direct answer style summary: This AGENTS.md Template defines roles, handoffs, and gating for AWS production system design, enabling multi-agent coordination with strict tool governance and escalation paths.
When to Use This AGENTS.md Template
- During early AWS architecture design and IaC planning for a production system on AWS
- When coordinating multiple AI coding agents across planner, architect, implementer, security, and operations roles
- When establishing governance boundaries, memory sources, and source-of-truth for the AWS stack
- When validating security posture, compliance, and deployment pipelines before production
Copyable AGENTS.md Template
# AGENTS.md
Project role: AWS Production System Designer and Orchestrator overseeing multi-agent collaboration for a secure, scalable production AWS environment.
Agent roster and responsibilities
- Planner: defines goals, success criteria, constraints, and acceptance tests for the AWS production stack
- Architect: designs AWS topology including VPCs, networking, IAM, security controls, and service boundaries
- Implementer: writes IaC (CDK preferred) and pipelines to deploy the AWS resources
- Security Specialist: enforces IAM policy, KMS, Secrets Manager, encryption, audit trails, and compliance controls
- SRE/Operator: defines runbooks, monitoring, incident response, and disaster recovery
- Reviewer: validates architecture and IaC against requirements and security posture
- Tester: creates and runs unit, integration, and end-to-end tests for the AWS stack
- Researcher: sources AWS best practices, service limits, and cost controls
- Domain Specialist: provides AWS service specifics and constraints for services in scope
Supervisor or orchestrator behavior
- The planner submits outputs to the architect and implementer only after approval criteria are satisfied
- The orchestrator enforces tool governance, ensures memory context is refreshed, and logs decisions for traceability
- All handoffs require explicit acceptance signals and a short validation check before progression
Handoff rules between agents
- Planner -> Architect: provide architecture goals, non-functional requirements, and boundary constraints
- Architect -> Implementer: provide IaC blueprint, resource dependencies, and service configurations
- Implementer -> Security Specialist: apply access controls and encryption settings, then verify with security checks
- Security Specialist -> SRE: handoff confirmed secure configuration and monitoring hooks
- SRE -> Reviewer: present runbooks and run-time checks for approval
Context, memory, and source-of-truth rules
- Source of truth is the central repository containing architecture diagrams, IaC code, runbooks, and policy documents
- Agents must fetch current context from the repository at the start of each cycle and refresh state after each handoff
- All decisions must be traceable to a persistent log and linked to specific repository commits or PRs
Tool access and permission rules
- Access to AWS resources must be granted through least-privilege roles in IAM
- IaC changes require review and PR approvals before deployment
- Secrets and credentials must only be accessed via Secrets Manager or parameter store with rotation policies
Architecture rules
- Use a single VPC with isolated acute and shared services, defined subnets, security groups, endpoints, and NAT gateways
- Use CDK to generate CloudFormation stacks with drift detection enabled
- Implement cost controls and tagging strategy for all resources
File structure rules
- Centralize IaC under infrastructure/cdk
- Keep applications under apps/
- Place runbooks under runbooks/
- Store docs and architecture decisions under docs/
Data, API, or integration rules when relevant
- All data in transit must be encrypted; data at rest must be encrypted using KMS keys
- Expose only approved APIs; use API Gateway or load balancers with WAF protections where applicable
Validation rules
- Idempotent deployments; verify drift is zero after deployment
- Validate security posture with automated checks; runbooks must pass before go-live
Security rules
- Enforce least privilege and role-based access
- Rotate credentials and secrets; enforce MFA for privileged access
- Log all access and changes to a centralized SIEM
Testing rules
- Unit tests for IaC modules; integration tests for AWS services; end-to-end tests for deployment and rollback scenarios
Deployment rules
- Deploy via CI/CD with staged environments; require approval gates for production
- Ensure rollback procedures are tested and ready
Human review and escalation rules
- All critical changes require human review at PR level and sign-off from security and SRE
- Escalation paths defined for outages or incidents
Failure handling and rollback rules
- Define automated rollback triggers on deployment failure or security policy violations
- Maintain a rollback playbook and runbooks for incident response
Things Agents must not do
- Do not bypass access controls or share credentials
- Do not make production changes without approvals
- Do not drift away from the defined architecture and runbooks
Recommended Agent Operating Model
- Roles and boundaries: Planner, Architect, Implementer, Security Specialist, SRE, Reviewer, Tester, Researcher, Domain Specialist
- Decision boundaries: each role can approve outputs only within its domain; escalation to orchestrator for cross-domain decisions
- Escalation paths: failed validation or security trigger leads to human review and rollback
Recommended Project Structure
infrastructure/
cdk/
lib/
networks.ts
app.ts
stacks/
configs/
pipelines/
apps/
service-app/
tests/
docs/
runbooks/
scripts/
policies/
secrets/
tools/
Core Operating Principles
- Operate with explicit handoffs and traceable decisions
- Respect the source of truth and source control history
- Enforce least privilege and secret rotation
- Prefer idempotent deployments and safe rollback
- Document failures and escalation paths
Agent Handoff and Collaboration Rules
- Planner defines goals and acceptance criteria for AWS production design
- Architect translates goals into AWS topology and service choices
- Implementer turns designs into IaC and pipelines
- Security Specialist applies controls and audits changes
- Reviewer validates compliance and architecture integrity
- Tester runs automated and manual tests before promotion
- SRE maintains runbooks and monitors production health
- Researcher sources best practices and updates guidelines
- Domain Specialist ensures service-specific constraints are honored
Tool Governance and Permission Rules
- Grant AWS access through least-privilege roles
- Code changes require PR approvals and automated checks
- Secrets managed by Secrets Manager with rotation
- Production changes require explicit human approval
- Monitoring and alerting must be enabled and tested
Code Construction Rules
- Use AWS CDK for IaC; avoid ad hoc console changes without version control
- All resources must be tagged and monitored
- Include drift detection and validation tests
- Write idempotent deploy scripts and verify outputs
Security and Production Rules
- Least privilege IAM roles; deny by default
- Secrets rotation and encryption in transit and at rest
- Audit logging for all critical actions
- Production change windows and rollback procedures
Testing Checklist
- Unit tests for IaC modules
- Integration tests of AWS services
- End-to-end tests including deployment and rollback
- Security and compliance checks
Common Mistakes to Avoid
- Skipping code reviews for production changes
- Overlooking drift between CI/CD and live resources
- Ignoring secrets rotation and access control updates
- Rushing production deployment without validation
Related implementation resources: AI Use Case for Policy Documents and Internal Question Answering and AI Use Case for Corporate Event Managers Using Slack To Orchestrate Day-Of Venue Tasks Across Multi-Department Teams.
FAQ
What is the purpose of this AGENTS.md Template for AWS production system design?
This AGENTS.md Template defines the operating manual for AI coding agents to design, validate, and operate an AWS production environment with multi-agent orchestration and governance.
Which AWS services and patterns are covered by this workflow?
The template covers VPC, subnets, IAM, Secrets Manager, KMS, S3, Lambda, ECS/EKS, RDS, CloudFormation/CDK, CodePipeline, CloudWatch, and guardrails for security and reliability.
How are secrets and credentials managed in this workflow?
Secrets are stored in AWS Secrets Manager, accessed via least-privilege IAM roles, rotated automatically, with audit trails in CloudTrail.
How does multi-agent handoff ensure correctness and governance?
Handoffs are explicit, with source-of-truth repositories, runbooks, and decision logs. The planner validates outputs before passing to implementers, with automated checks and human review.