Azure Production System Design AGENTS.md Template

Overview

This AGENTS.md Template defines a project-level operating manual for Azure production system design using AI coding agents. It governs both single-agent workflows and multi-agent orchestration across provisioning, deployment, monitoring, and governance in Azure, with explicit handoffs, context storage, and tool governance.

Direct answer: Use this AGENTS.md Template to establish a repeatable, auditable working contract for Azure production systems, ensuring safe collaboration among planners, implementers, reviewers, researchers, and domain specialists.

When to Use This AGENTS.md Template

You are designing or maintaining Azure production environments that require repeatable agent-driven workflows.
You need explicit handoffs and escalation paths between planners, implementers, reviewers, and testers.
You must enforce tool governance, secret management, and secure deployment practices in the cloud.
You want a single source of truth for architecture decisions, validation criteria, and rollback procedures.

Copyable AGENTS.md Template

# AGENTS.md

Project: Azure Production System Design

Project role:
- Architect: defines the Azure target state and verifies alignment with business outcomes
- Planner: builds an actionable plan for Azure resources and CI/CD pipelines
- Implementer: translates plans into Azure resources and automation scripts
- Reviewer: validates design, security posture, and compliance
- Researcher: identifies constraints, dependencies, and alternative patterns
- Domain Specialist: provides Azure-specific domain knowledge (Networking, Identity, Security)

Supervisor / Orchestrator:
- Orchestrator coordinates prompts, enforces handoffs, records decisions to the source of truth, and triggers validation gates.

Handoff rules:
- Planner ➜ Implementer on plan acceptance
- Implementer ➜ Reviewer upon completion of implementation
- Reviewer ➜ Implementer for fixes, then back to Reviewer until sign-off
- Any critical risk triggers escalation to human review and pause in production changes

Context, memory, and source of truth:
- All decisions, designs, and artifacts are stored in the repository as the single source of truth
- Memory includes current Azure resource inventory, desired state, and policy constraints
- References to external docs must be linked to internal knowledge bases

Tool access and permission rules:
- Access to Azure CLI, ARM/Bicep templates, Terraform where allowed, and Azure REST APIs is restricted by role-based access control
- Secrets must only be retrieved from Azure Key Vault and never logged

Architecture rules:
- All Azure resources must be described in a formal architecture diagram and ARM/Bicep templates where applicable
- Use established resource naming conventions and tagging policies

File structure rules:
- Keep architecture-specific files under infrastructure/azure and docs/architecture
- Do not place production secrets in code or docs

Data, API, or integration rules:
- Use managed identities and secure endpoints
- Encrypt data in transit and at rest where applicable

Validation rules:
- Design validated by Planner and Confirmed by Reviewer
- Automate checks for drift, policy violations, and security controls

Security rules:
- Secrets must be stored in Key Vault
- Production changes require approval gates
- Least privilege for all agents

Testing rules:
- Unit tests for scripts, integration tests for deployment workflows, end-to-end tests for recovery scenarios

Deployment rules:
- Use CI/CD gates for deployment to Azure environments
- Rollback procedures must exist and be tested

Human review and escalation rules:
- Trigger human review on high-risk changes or failed validations
- Escalations must include rationale and next steps

Failure handling and rollback rules:
- Restore to last known good configuration on failure
- Log all failures with context for post-mortem

Things Agents must not do:
- Do not bypass approval gates
- Do not reveal secrets in logs
- Do not modify production state without consensus and rollback plan

Recommended Agent Operating Model

Roles and boundaries for Azure production system design:

Planner: defines the target Azure design, dependencies, and the sequence of actions; decides handoff points.
Implementer: provisions resources, writes automation scripts, applies templates, and ensures alignment with the plan.
Reviewer: performs architecture, security, and compliance reviews; validates against policies and risk thresholds.
Researcher: probes Azure constraints, costs, and alternate patterns; surfaces trade-offs for decision-makers.
Domain Specialist: provides Azure-specific expertise (Networking, Identity, Security, Governance) and validates domain requirements.
Supervisor/Orchestrator: enforces process, coordinates handoffs, curates memory, and enacts validation gates.

Recommended Project Structure

azure-prod-design/
├── agents/
│   ├── planner.md
│   ├── implementer.md
│   ├── reviewer.md
│   ├── researcher.md
│   └── domain-specialist.md
├── infrastructure/
│   └── azure/
│       ├── templates/
│       │   ├── main.bicep
│       │   └── main.parameters.json
│       └── scripts/
├── configs/
├── policies/
├── docs/
│   └── architecture-diagrams/
├── tests/
│   └── e2e/
└── .github/
    └── workflows/

Core Operating Principles

Single source of truth for all Azure designs and agent decisions.
Deterministic outputs: identical plans yield identical results when inputs are the same.
Security by design: least privilege, secrets vaulting, and auditable changes.
Clear ownership and escalation paths for all actions.
Explicit handoffs with validation gates before progression.

Agent Handoff and Collaboration Rules

Rules by role for Azure workflows:

Planner → Implementer: hand off after plan approval with a unique plan identifier and resource graph.
Implementer → Reviewer: hand off on completion of resource provisioning and script execution; include logs and artifacts.
Reviewer → Implementer: require fixes and re-run validations if any security or architecture issues are found.
Researcher → Planner: when new constraints or costs surface, update the plan and reconsider trade-offs.
Domain Specialist → all: provide domain-specific reviews before production deployment.

Tool Governance and Permission Rules

Azure CLI and ARM/Bicep usage must reference approved templates and parameter sets.
All secrets must be retrieved from Azure Key Vault; never logged or exposed in output.
Production changes require an approval gate and a rollback plan.
Access to production environments is controlled by the orchestrator and RBAC roles.
All API calls to Azure resources must be audited and traceable.

Code Construction Rules

Templates must be idempotent and support drift detection.
All changes must be versioned and described in PR messages tied to the AGENTS.md template.
Environment-specific values must be parameterized and stored in secure configurations.
Scripts must be deterministic and include validation hooks after execution.

Security and Production Rules

Use managed identities for resource access; never embed credentials.
Enforce network security groups, firewall rules, and private endpoints where applicable.
Implement resource tagging for governance and cost tracking.
Establish access review cadence and automated alerting for critical changes.

Testing Checklist

Unit tests for each script/module; mock Azure calls where possible.
Integration tests for ARM/Bicep templates and resource provisioning sequences.
End-to-end tests for deployment, rollback, and validation gates.
Security tests for secrets handling and IAM permissions.

Common Mistakes to Avoid

Skipping validation gates or skipping human review for production changes.
Hard-coding secrets or credentials in scripts or templates.
Unclear handoff boundaries leading to duplicated work or drift.
Ignoring cost implications or Azure policy constraints in design decisions.

FAQ

What is the purpose of this AGENTS.md Template for Azure?

It provides a repeatable, auditable operating manual for Azure production system design using AI coding agents, enabling multi-agent orchestration and clear handoffs.

How does multi-agent orchestration work in this template?

Agents collaborate via a planner-driven plan, with explicit handoffs and validation gates enforced by the orchestrator to prevent drift and ensure compliance.

Where should this template be stored?

In the repository under infrastructure/design/ and referenced by the AGENTS.md as the project operating manual; all artifacts contribute to the single source of truth.

What if a deployment fails?

Follow the rollback procedure outlined in the AGENTS.md, re-run validations, and revert to the last known good configuration before attempting a retry.

How are secrets handled?

Secrets are retrieved from Azure Key Vault at runtime; do not log or store secrets in code or output logs.

When is human review required?

For high-risk changes, security policy exceptions, or any failure in automated validation gates, trigger human review and pause production changes.

Target User

Use Cases