Incident Response AGENTS.md Template for Architecture and Orchestration
AGENTS.md Template for incident response architecture, detailing an end-to-end multi-agent workflow for detection, triage, remediation, and post-incident review.
Target User
Developers, security engineers, IR teams, SREs
Use Cases
- Incident detection and triage
- Automated remediation via agent playbooks
- Post-incident analysis and reporting
Markdown Template
Incident Response AGENTS.md Template for Architecture and Orchestration
# AGENTS.md
Project Role
- Incident Response Architecture for multi-agent orchestration of detection, triage, remediation, and post-incident analysis.
- This AGENTS.md template sets the operating context, decision boundaries, and escalation paths for all responding agents.
Agent roster and responsibilities
- IncidentCommander (IC): coordinate containment, communicate with stakeholders, approve high-risk actions.
- DetectionAgent: gather telemetry from SIEM/EDR/logs, raise alerts with context.
- TriageAgent: assess severity, collect artifacts, determine required runbooks.
- PlaybookExecutorAgent: execute remediation steps via automation tools and runbooks.
- ForensicAgent: preserve evidence, collect artifacts with chain of custody.
- CommunicatorAgent: provide status updates to on-call engineers and stakeholders.
- AuditorAgent: validate actions, ensure compliance, and document evidence integrity.
- KnowledgeAgent: update incident knowledge base and post-incident lessons.
Supervisor or orchestrator behavior
- IncidentController coordinates tasking, enforces timeouts, validates outputs, and escalates when needed.
- Orchestrator maintains a single source of truth (memory store) and versioned runbooks.
Handoff rules between agents
- IC -> Triage: on detection with actionable severity.
- Triage -> PlaybookExecutor: when remediation is automated or semi-automated.
- PlaybookExecutor -> ForensicAgent: when artifacts are created or required for evidence.
- ForensicAgent -> AuditorAgent: for validation before closure.
- CommunicatorAgent: broadcasts status through each handoff and during escalation.
Context, memory, and source-of-truth rules
- Memory keys: incident_id, severity, state, artifacts, runbooks_version.
- Source of truth: SIEM, EDR, ticketing system; all actions reference incident_id.
- Memory persists across steps for auditability and rollback.
Tool access and permission rules
- Agents receive scoped credentials; access is least-privilege and ephemeral where possible.
- Secrets in a vault; no plain-text credentials in logs or memory.
- Production tool actions require explicit approvals or runbooks with rollback hooks.
Architecture rules
- Event-driven architecture with a central event bus and memory store.
- Agents are stateless where possible; state persisted in memory and artifacts.
- Runbooks are versioned and auditable.
File structure rules
- incident-response/
- runbooks/
- artifacts/
- memory/
- tools/
- templates/
- tests/
Data, API, or integration rules when relevant
- Data models: Incident, Artifact, Action, StakeholderUpdate
- API endpoints: /incidents, /incidents/{id}/artifacts, /incidents/{id}/actions
- Integrations: SIEM, Ticketing, CMDB
Validation rules
- Verify data integrity and evidence preserves chain of custody.
- Validate action outputs against runbooks and expected outcomes.
Security rules
- Least privilege, encryption at rest, access audits, and secrets management.
- No production changes without approvals from IC and on-call IR lead when required.
Testing rules
- Unit tests for agents, integration tests with mock services, end-to-end IR simulations.
Deployment rules
- Deploy runbooks and agent configurations through controlled release with approvals.
- Maintain rollback plans for any automation changes.
Human review and escalation rules
- Severe incidents require human-in-the-loop review by on-call IR lead.
- All critical decisions logged and reviewed post-incident.
Failure handling and rollback rules
- If an action fails, revert to the last known good state and re-run from the memory store.
- Alert and escalate if rollback is not possible within SLAs.
Things Agents must not do
- Do not exfiltrate data or bypass security controls.
- Do not modify production state without explicit approval.
- Do not bypass runbooks or perform unsanctioned changes.Overview
Direct answer: This AGENTS.md template defines a full incident response architecture using multi-agent orchestration, including detection, triage, remediation, evidence handling, and escalation. It provides a single source of truth and clear handoffs for safe, auditable responses.
This page documents an AGENTS.md template for incident response architecture that governs how AI coding agents collaborate to detect, triage, contain, eradicate, recover, and learn from security incidents. It covers both single agent responsibilities and multi-agent orchestration patterns, ensuring consistent runbooks, traceability, and governance across tools and human review.
When to Use This AGENTS.md Template
- Designing a scalable incident response program with clearly defined agent roles and handoffs.
- Defining runbooks that enable reliable automation while preserving evidence and chain of custody.
- Establishing tool governance, secrets management, and access controls for IR automation.
- Ensuring auditable decision-making with human review and escalation paths for high-severity events.
- Providing a copyable template that teams can paste into an AGENTS.md file to bootstrap IR workflows.
Copyable AGENTS.md Template
# AGENTS.md
Project Role
- Incident Response Architecture for multi-agent orchestration of detection, triage, remediation, and post-incident analysis.
- This AGENTS.md template sets the operating context, decision boundaries, and escalation paths for all responding agents.
Agent roster and responsibilities
- IncidentCommander (IC): coordinate containment, communicate with stakeholders, approve high-risk actions.
- DetectionAgent: gather telemetry from SIEM/EDR/logs, raise alerts with context.
- TriageAgent: assess severity, collect artifacts, determine required runbooks.
- PlaybookExecutorAgent: execute remediation steps via automation tools and runbooks.
- ForensicAgent: preserve evidence, collect artifacts with chain of custody.
- CommunicatorAgent: provide status updates to on-call engineers and stakeholders.
- AuditorAgent: validate actions, ensure compliance, and document evidence integrity.
- KnowledgeAgent: update incident knowledge base and post-incident lessons.
Supervisor or orchestrator behavior
- IncidentController coordinates tasking, enforces timeouts, validates outputs, and escalates when needed.
- Orchestrator maintains a single source of truth (memory store) and versioned runbooks.
Handoff rules between agents
- IC -> Triage: on detection with actionable severity.
- Triage -> PlaybookExecutor: when remediation is automated or semi-automated.
- PlaybookExecutor -> ForensicAgent: when artifacts are created or required for evidence.
- ForensicAgent -> AuditorAgent: for validation before closure.
- CommunicatorAgent: broadcasts status through each handoff and during escalation.
Context, memory, and source-of-truth rules
- Memory keys: incident_id, severity, state, artifacts, runbooks_version.
- Source of truth: SIEM, EDR, ticketing system; all actions reference incident_id.
- Memory persists across steps for auditability and rollback.
Tool access and permission rules
- Agents receive scoped credentials; access is least-privilege and ephemeral where possible.
- Secrets in a vault; no plain-text credentials in logs or memory.
- Production tool actions require explicit approvals or runbooks with rollback hooks.
Architecture rules
- Event-driven architecture with a central event bus and memory store.
- Agents are stateless where possible; state persisted in memory and artifacts.
- Runbooks are versioned and auditable.
File structure rules
- incident-response/
- runbooks/
- artifacts/
- memory/
- tools/
- templates/
- tests/
Data, API, or integration rules when relevant
- Data models: Incident, Artifact, Action, StakeholderUpdate
- API endpoints: /incidents, /incidents/{id}/artifacts, /incidents/{id}/actions
- Integrations: SIEM, Ticketing, CMDB
Validation rules
- Verify data integrity and evidence preserves chain of custody.
- Validate action outputs against runbooks and expected outcomes.
Security rules
- Least privilege, encryption at rest, access audits, and secrets management.
- No production changes without approvals from IC and on-call IR lead when required.
Testing rules
- Unit tests for agents, integration tests with mock services, end-to-end IR simulations.
Deployment rules
- Deploy runbooks and agent configurations through controlled release with approvals.
- Maintain rollback plans for any automation changes.
Human review and escalation rules
- Severe incidents require human-in-the-loop review by on-call IR lead.
- All critical decisions logged and reviewed post-incident.
Failure handling and rollback rules
- If an action fails, revert to the last known good state and re-run from the memory store.
- Alert and escalate if rollback is not possible within SLAs.
Things Agents must not do
- Do not exfiltrate data or bypass security controls.
- Do not modify production state without explicit approval.
- Do not bypass runbooks or perform unsanctioned changes.
Recommended Agent Operating Model
Roles, responsibilities, decision boundaries, and escalation paths for incident response agents.
- IncidentCommander has final authority on containment and risk acceptance actions; can authorize or veto mitigations.
- DetectionAgent must provide precise context and evidence to support triage decisions.
- TriageAgent decides required runbooks, collects artifacts, and assigns severity with rationale.
- PlaybookExecutorAgent implements remediation steps with traceability and rollback hooks.
- ForensicAgent ensures evidence integrity and chain-of-custody for post-incident analysis.
- CommunicatorAgent maintains stakeholder visibility and documents decisions and outcomes.
- AuditorAgent validates actions for compliance and produces post-incident reports.
Recommended Project Structure
incident-response/
runbooks/
artifacts/
memory/
tools/
templates/
tests/
Core Operating Principles
- Operate with a single source of truth and versioned runbooks.
- Enforce least privilege and secrets hygiene for all agents.
- Ensure auditable, time-bounded decision-making with clear handoffs.
- Preserve evidence and maintain chain-of-custody throughout the incident lifecycle.
Agent Handoff and Collaboration Rules
- Planner decisions must be validated by the TriageAgent before execution.
- Implementer (PlaybookExecutorAgent) must log each remediation step with artifacts produced.
- Reviewer (AuditorAgent) must recheck actions against runbooks and compliance requirements.
- Tester validates that changes do not violate security constraints or tool governance.
- Researcher informs the runbook with new indicators and lessons learned for future incidents.
- Domain Specialist provides context for specialized environments (cloud, on-prem, OT).
Tool Governance and Permission Rules
- Execute commands with enforceable timeouts and rollback hooks.
- Edits to files must be audited and tracked in memory.
- APIs must use ephemeral credentials and be scoped by incident_id.
- Secrets must reside in a centralized vault with strict access controls.
- All production actions require approval gates; automatic actions require runbooks.
Code Construction Rules
- Avoid hard-coding secrets; fetch from vault or secret management service.
- Ensure idempotence; repeated actions must not cause side effects.
- Validate inputs and outputs at every step; log validation results.
- Do not bypass runbooks or perform unapproved changes.
- Maintain backward compatibility with previous incident records.
Security and Production Rules
- Enforce least privilege, encryption in transit and at rest.
- All actions auditable; tamper-evident logs and immutable artifacts.
- Use audit-ready channels for communication with stakeholders.
- Segregate duties to prevent conflict of interest in incident handling.
Testing Checklist
- Unit tests for agents and memory store integrity.
- Integration tests with mock SIEM, EDR, and ticketing systems.
- End-to-end IR simulations to verify runbooks and handoffs.
- Security testing including access controls and vault integration.
Common Mistakes to Avoid
- Ambiguity in handoff rules or decision boundaries.
- Overly broad permissions or non-versioned runbooks.
- Ignoring evidence integrity and chain-of-custody requirements.
- Unclear escalation paths leading to delayed containment.
Related implementation resources: AI Use Case for Xero Reports and Business Performance Insights and AI Use Case for Typeform Responses and Google Sheets Analysis.
FAQ
What is the purpose of this AGENTS.md Template for incident response architecture?
This template defines a complete incident response operating model with multi-agent coordination, runbooks, and governance for auditable responses.
How does this template support multi-agent orchestration in incident response?
It specifies agent roles, handoffs, memory rules, and a central orchestrator to coordinate actions across detection, triage, remediation, and post-incident review.
What are the key handoff rules between agents?
IC to Triage, Triage to PlaybookExecutor, PlaybookExecutor to ForensicAgent, ForensicAgent to AuditorAgent, with CommunicatorAgent updating stakeholders throughout.
How are tools and secrets managed within the template?
Credentials are scoped and ephemeral; secrets are stored in a vault; production actions require approvals and runbooks with rollback hooks.
How is compliance and post-incident review handled?
AuditorAgent validates actions, preserves evidence, and produces post-incident reports; KnowledgeAgent records lessons learned.