AGENTS.md Template for Production Debugging Workflows | Suhas Bhairav

Overview

The AGENTS.md Template for Production Debugging Workflows provides a formal operating manual for AI coding agents tasked with diagnosing and resolving issues in live systems. It covers both single agent operation and multi agent orchestration, and it defines roles, handoffs, memory, and governance to maintain reliability and safety during debugging.

When to Use This AGENTS.md Template

During active production incidents requiring rapid triage, investigation, and remediation
To standardize debugging processes across teams and services
When introducing multi agent orchestration for incident handling
To ensure tool governance and human review for high risk changes
When documenting project level operating context for AI coding agents

Copyable AGENTS.md Template

The following is a ready to paste AGENTS.md template block that governs the production debugging workflow

# AGENTS.md
Project role
  Incident Debugging Operator responsible for triage, reproduction, and remediation in production
Agent roster and responsibilities
  Planner: defines tasks, coordinates sub agents, tracks progress
  Implementer: develops fixes and changes to services in a controlled environment
  Tester: validates fixes against scenarios including synthetic events
  Reviewer: approves changes before deployment
  Researcher: gathers logs, traces, metrics, and domain context
  Domain Specialist: provides domain specific insights for accurate remediation
  Orchestrator: supervises the workflow, enforces handoffs and memory updates
Supervisor or orchestrator behavior
  The orchestrator maintains a shared memory of issue state, assigns tasks, triggers handoffs, and enforces escalation gates
  It chooses safe execution paths and halts on prohibited actions
Handoff rules between agents
  When a task completes, hand off artifacts including context, logs, and test results to the next agent
  If a task fails, escalate to the orchestrator and apply rollback if necessary
Context memory and source of truth rules
  Store issue context, logs, traces, and remediation notes in a persistent memory
  Use a single source of truth including dashboard metrics and incident tickets
Tool access and permission rules
  Agents may read logs and metrics, execute limited read write actions in staging, and request production changes via the orchestrator
  Secrets must be retrieved from secure vaults and never logged
Architecture rules
  Node based microservice architecture with clear service boundaries and observability
  Agents communicate via defined interfaces and do not perform unapproved global config changes
File structure rules
  Keep debugging artifacts under a dedicated production_debugging folder
  Do not mix with application source; separate concerns
Data API or integration rules
  Use production data only via controlled simulators or synthetic events when possible
  Do not write to production data or settings without approval
Validation rules
  All fixes must pass automated tests in CI and validated in staging before canary deployment
Security rules
  No secrets in messages, all credentials fetched from vaults, rotate after remediation
Testing rules
  Include unit tests, integration tests, and end to end tests in staging where applicable
Deployment rules
  Deploy changes through canary canaries with monitoring and alerting
Human review and escalation rules
  High risk changes require human sign off
Failure handling and rollback rules
  If remediation fails, revert code and configuration to known good state; preserve logs and memory
Things Agents must not do
  Do not modify production system configurations without approval
  Do not bypass governance or run unsupported tooling in production
  Do not disclose sensitive data in messages

Recommended Agent Operating Model

Roles and decision boundaries for effective production debugging

Planner determines scope, assigns tasks, coordinates handoffs
Implementer makes targeted changes in staging first then production with approvals
Tester validates changes against scenarios and live event emulations
Reviewer approves before deployment to production
Researcher collects logs and traces to inform decisions
Domain Specialist provides context on domain specific constraints
Orchestrator enforces governance and escalations

Recommended Project Structure

production_debugging/
├─ agents/
│  ├─ planner/
│  ├─ implementer/
│  ├─ tester/
│  ├─ reviewer/
│  ├─ researcher/
│  └─ domain_specialist/
├─ data/
│  ├─ logs/
│  ├─ traces/
│  └─ memory/
├─ integrations/
├─ tests/
└─ docs/

Core Operating Principles

Explicit handoffs with complete context
Single source of truth for issue state
Limit tool access to minimize blast radius
Human review for high risk actions
Traceable decisions and auditable changes

Agent Handoff and Collaboration Rules

Rules for planner implementer tester reviewer researcher and domain specialist

Tool Governance and Permission Rules

Rules for command execution file edits API calls secrets and production systems

Code Construction Rules

Specific implementation constraints for this workflow

Security and Production Rules

Security policy for debugging in production including access controls and data handling

Testing Checklist

Unit tests for each component
Integration tests across services
Staging to production canary checks
Rollout monitoring and automatic rollback on anomalies

Common Mistakes to Avoid

Skipping human review for high risk changes
Overwriting production data or configurations without sign off
Ignoring memory and source of truth drift

FAQ

How does this AGENTS.md template define a production debugging workflow?

This template defines roles, handoffs, and governance for debugging incidents in production, enabling single or multi agent coordination.

Who should be on the agent roster for production debugging?

The roster includes Planner, Implementer, Tester, Reviewer, Researcher, Domain Specialist, and Orchestrator for coordination and control.

How are handoffs between agents handled?

Handoffs include complete context, artifacts, and test results; failures escalate to the orchestrator with a rollback plan if needed.

What are the tool governance requirements?

Access is limited, secrets are retrieved from vaults, and production changes require orchestrator approval and audit trails.

How is security ensured during debugging?

Secrets never leave vaults, credentials rotate after remediation, and logs are scrubbed to avoid exposing sensitive data.

AGENTS.md Template: Production Debugging Workflows

Target User

Use Cases