AGENTS.md TemplatesAGENTS.md Template

AGENTS.md Template for Google Cloud Production System Design

AGENTS.md Template for Google Cloud production system design — a copyable operating manual to govern single and multi-agent workflows, tool governance, and handoffs.

AGENTS.md templateGoogle CloudGCPmulti-agent orchestrationtool governancehandoff rulesTerraformDeployment ManagerSRE

Target User

Developers, cloud architects, SREs, product teams

Use Cases

  • Design and operate production systems on Google Cloud with regulated agent workflows
  • Coordinate single-agent and multi-agent orchestration across GCP services

Markdown Template

AGENTS.md Template for Google Cloud Production System Design

# AGENTS.md
Project role: Cloud Infra Architect, DevOps Engineer, Cloud Security Lead, SRE, Data Engineer
Agent roster and responsibilities:
- Planner Agent: defines the design and orchestration plan and aligns with Google Cloud security baselines
- Implementer Agent: codifies infrastructure as code using Terraform and configures Google Cloud resources in prod-safe patterns
- Reviewer Agent: reviews IaC for correctness, security baselines, and policy conformance
- Tester Agent: runs unit and integration tests against the design and deployment pipeline
- Researcher Agent: gathers official Google Cloud docs and reference architectures
- Domain Specialist Agent: ensures data governance, privacy, and regulatory compliance
- Operator Agent: monitors prod deployments and triggers runbooks

Supervisor or orchestrator behavior:
- The Orchestrator Agent coordinates tasks, enforces memory rules, and maintains the source of truth

Handoff rules between agents:
- Planner to Implementer when design is approved
- Implementer to Reviewer after IaC draft
- Reviewer to Tester for integration validation
- Domain Specialist to Reviewer for security and compliance review
- Researcher to Planner for updated guidance
- Operator coordinates runbooks and production readiness with all agents

Context, memory, and source-of-truth rules:
- Use a single source of truth in a remote Git repository with remote Terraform state
- Memory is scoped to an orchestration run with explicit decision logs stored in a central logs bucket

Tool access and permission rules:
- Least privilege service accounts for gcloud calls, secret access in Secret Manager, and KMS keys
- No hard coded credentials; all secrets retrieved at runtime
- Deployments require policy checks and approvals

Architecture rules:
- Google Cloud managed services with clean separation of concerns
- Use Pub/Sub for coordination, Cloud Run or GKE for workloads, and Cloud Build for CI CD
- Centralized monitoring with Cloud Monitoring and logging with Cloud Logging

File structure rules:
- Infrastructure as code modules in infra/modules; environment specific configs in environments/prod|staging
- Application code under apps; pipelines under pipelines
- Documentation under docs

Data, API, or integration rules:
- Data artifacts stored in Cloud Storage with proper IAM controls
- APIs governed by IAM roles and service accounts; secrets in Secret Manager

Validation rules:
- terraform validate and terraform plan; policy checks; unit tests for modules
- integration tests in a staging environment before prod

Security rules:
- VPC Service Controls; all secrets encrypted with KMS
- production deployments gated by approvals and audit trails

Testing rules:
- unit, integration, end-to-end tests; canary deployments; rollback tests

Deployment rules:
- CI CD pipelines in Cloud Build; canary first; promote with approvals
- rollback plan and timeboxed rollbacks

Human review and escalation rules:
- Human review required for prod deployments; escalation to SRE and Security

Failure handling and rollback rules:
- If failures detected, rollback resources and revert code; notify owners; switch to safe mode

Things Agents must not do:
- Do not bypass approvals; do not store secrets in code; do not drift away from the source of truth

Overview

Direct answer: This AGENTS.md Template codifies a Google Cloud production system design workflow, enabling single-agent and multi-agent orchestration with clear roles, handoffs, and governance.

This page provides a copyable AGENTS.md template that you can paste into a project so teams can operate a design, implementation, review, testing, and deployment loop for AI coding agents on Google Cloud.

When to Use This AGENTS.md Template

  • Standardize production system design for AI coding agents on Google Cloud across teams.
  • Document orchestration patterns for multi agent coordination using GCP services such as Cloud Run, GKE, Cloud Functions, Pub/Sub, and Cloud Build.
  • Serve as an auditable playbook for architecture decisions, tool governance, and handoffs.
  • Provide a canonical context for humans and agents to operate within defined boundaries and safety rules.

Copyable AGENTS.md Template

# AGENTS.md
Project role: Cloud Infra Architect, DevOps Engineer, Cloud Security Lead, SRE, Data Engineer
Agent roster and responsibilities:
- Planner Agent: defines the design and orchestration plan and aligns with Google Cloud security baselines
- Implementer Agent: codifies infrastructure as code using Terraform and configures Google Cloud resources in prod-safe patterns
- Reviewer Agent: reviews IaC for correctness, security baselines, and policy conformance
- Tester Agent: runs unit and integration tests against the design and deployment pipeline
- Researcher Agent: gathers official Google Cloud docs and reference architectures
- Domain Specialist Agent: ensures data governance, privacy, and regulatory compliance
- Operator Agent: monitors prod deployments and triggers runbooks

Supervisor or orchestrator behavior:
- The Orchestrator Agent coordinates tasks, enforces memory rules, and maintains the source of truth

Handoff rules between agents:
- Planner to Implementer when design is approved
- Implementer to Reviewer after IaC draft
- Reviewer to Tester for integration validation
- Domain Specialist to Reviewer for security and compliance review
- Researcher to Planner for updated guidance
- Operator coordinates runbooks and production readiness with all agents

Context, memory, and source-of-truth rules:
- Use a single source of truth in a remote Git repository with remote Terraform state
- Memory is scoped to an orchestration run with explicit decision logs stored in a central logs bucket

Tool access and permission rules:
- Least privilege service accounts for gcloud calls, secret access in Secret Manager, and KMS keys
- No hard coded credentials; all secrets retrieved at runtime
- Deployments require policy checks and approvals

Architecture rules:
- Google Cloud managed services with clean separation of concerns
- Use Pub/Sub for coordination, Cloud Run or GKE for workloads, and Cloud Build for CI CD
- Centralized monitoring with Cloud Monitoring and logging with Cloud Logging

File structure rules:
- Infrastructure as code modules in infra/modules; environment specific configs in environments/prod|staging
- Application code under apps; pipelines under pipelines
- Documentation under docs

Data, API, or integration rules:
- Data artifacts stored in Cloud Storage with proper IAM controls
- APIs governed by IAM roles and service accounts; secrets in Secret Manager

Validation rules:
- terraform validate and terraform plan; policy checks; unit tests for modules
- integration tests in a staging environment before prod

Security rules:
- VPC Service Controls; all secrets encrypted with KMS
- production deployments gated by approvals and audit trails

Testing rules:
- unit, integration, end-to-end tests; canary deployments; rollback tests

Deployment rules:
- CI CD pipelines in Cloud Build; canary first; promote with approvals
- rollback plan and timeboxed rollbacks

Human review and escalation rules:
- Human review required for prod deployments; escalation to SRE and Security

Failure handling and rollback rules:
- If failures detected, rollback resources and revert code; notify owners; switch to safe mode

Things Agents must not do:
- Do not bypass approvals; do not store secrets in code; do not drift away from the source of truth

Recommended Agent Operating Model

Roles and decision boundaries: planner defines architecture; implementer creates resources; reviewer validates; tester confirms; domain specialist ensures security; operator monitors; researchers gather knowledge. Escalation paths: to cloud orchestrator for blocking issues; to SRE for production incidents.

Recommended Project Structure

Workflow specific directory tree follows a modular Google Cloud native IaC approach focusing on prod readiness:

gcp-prod-agents-md-template/
  infra/
    modules/
      vpc/
      network-security/
      compute/
    environments/
      prod/
        main.tf
        variables.tf
        outputs.tf
        backend.tf
  apps/
    ingest-service/
      main.tf
      variables.tf
  pipelines/
    ci-cd/
      cloudbuild.yaml
  docs/
    ops-notes.md
  scripts/
    bootstrap.sh

Core Operating Principles

  • Single source of truth for design decisions and IaC state
  • Idempotent, auditable actions with clear versioning
  • Least privilege and secret management enforced by design
  • Explicit memory of decisions with time-stamped logs
  • Human-in-the-loop for prod changes and incident response

Agent Handoff and Collaboration Rules

  • Planner to Implementer for design realization
  • Implementer to Reviewer for IaC validation
  • Reviewer to Tester for integration checks
  • Domain Specialist to Reviewer for security and compliance
  • Researcher to Planner for updates
  • Operator to all for runbooks and prod readiness

Tool Governance and Permission Rules

  • CLI tools using service accounts with scoped roles
  • Secrets accessed via Secret Manager; never embedded in code
  • APIs accessed through restricted IAM policies and audit logging
  • Production changes gated by approvals and automated policy checks
  • Rollback and kill-switch procedures in place for outages

Code Construction Rules

  • Modular IaC using Terraform; modules in infra/modules
  • Outputs define downstream dependencies; inputs parameterized
  • Parallelizable tasks where possible; avoid race conditions
  • Idempotent apply operations; drift detection enabled
  • Documentation primed by code; changelogs maintained

Security and Production Rules

  • VPC service controls and private endpoints for prod networks
  • Data encrypted at rest with Cloud KMS; transit encryption enforced
  • Audit trails for every deploy; incident response runbooks
  • Access controls based on least privilege; mandatory MFA for sensitive actions
  • Regular rotation of credentials and secrets

Testing Checklist

  • Terraform validate and terraform plan; static checks
  • Unit tests for modules; integration tests in staging
  • Canary deployments and feature flag tests
  • End-to-end tests for data flows and APIs
  • Security and compliance scans; dependency checks
  • Disaster recovery drills and rollback verification

Common Mistakes to Avoid

  • Skipping approvals or bypassing policy checks
  • Hard coding secrets or credentials in code
  • Architectural drift between design document and prod
  • Inadequate monitoring or insufficient logging
  • Untracked changes to infrastructure state

FAQ

What is this AGENTS.md Template for Google Cloud Production System Design?

It provides a formal operating manual to govern single-agent and multi-agent workflows on Google Cloud, including roles, handoffs, tool governance, and escalation paths.

How does multi-agent orchestration work in this template?

It defines an orchestrator and an agent roster with explicit handoff points, memory rules, and source-of-truth to coordinate tasks across GCP services.

What tools and services are governed by this template?

Cloud IAM, Secret Manager, Cloud Storage, Terraform or Deployment Manager, Cloud Build, Cloud Run, GKE, Pub/Sub, and monitoring/logging services.

How are security and production rules enforced?

By least privilege access, secret management, encryption, policy checks, approval gates, and controlled deployment workflows with audit trails.

What are the escalation paths if something goes wrong?

Escalate to the Cloud Orchestrator, trigger a rollback, halt non-critical services, and notify SRE/security teams for remediation.

Related implementation resources: AI Use Case for Corporate Event Managers Using Slack To Orchestrate Day-Of Venue Tasks Across Multi-Department Teams and AI Agent Use Case for Wholesalers Using Multi-Currency Ledger Trackers To Calculate Foreign Exchange Risk Exposure Across Global Accounts.