Active-Passive Failover AGENTS.md Template

Overview

Direct answer: This AGENTS.md template defines the operating model for an active-passive failover pattern using AI coding agents to monitor health, decide failover, promote standby, validate, and handle handoffs with governance and escalation gates. It supports both single-agent and multi-agent orchestration patterns.

The template codifies roles, responsibilities, sources of truth, memory, and escalation, delivering a repeatable, auditable workflow for high availability scenarios.

When to Use This AGENTS.md Template

When you need explicit, codified failover procedures for critical services
To coordinate multiple agents across monitoring, decision, execution, and validation
To ensure tool governance and safe, auditable handoffs
When you want a reproducible template for post-failover recovery and rollback

Copyable AGENTS.md Template

# AGENTS.md

Project Role: Active-Passive Failover Designer

Agent roster:
- PlannerAgent: orchestrator for failover decisions; responsibilities monitor primary health, evaluate thresholds, trigger failover, coordinate with executor, maintain decision log
- ExecutorAgent: promotes standby to active, updates load balancer, reconfigures DNS, synchronizes state
- MonitorAgent: collects health metrics from primary and standby
- ValidationAgent: validates post-failover health and redundancy
- StateSyncAgent: ensures replication and state consistency
- AuditorAgent: records events and metrics for audit

Supervisor / Orchestrator: PlannerAgent coordinates failover

Handoff Rules:
- Planner informs Executor of failover decision
- StateSyncAgent provides latest state to Executor

Context memory and truth sources:
- decisions stored in memory with unique decision_id
- sources: health metrics DB, event log, config registry

Tool access and permission rules:
- can call update_lb, update_dns, read_metrics, read_config
- secrets retrieved via vault; never expose plaintext

Architecture Rules:
- microservice oriented; stateless orchestration; shared state in Redis

File Structure Rules:
- docs
- src/planner
- src/executor
- src/monitor
- src/validation
- tests

Data/API/Integration Rules:
- data from metrics_api, events_api; ensure data persisted

Validation Rules:
- test failover path with simulated failure
- confirm RTO and RPO

Security Rules:
- actions require permission gates
- secrets rotated

Testing Rules:
- unit and integration tests
- can run in staging

Deployment Rules:
- deploy to staging then production
- blue-green switching

Human Review and Escalation Rules:
- escalation path to on-call SRE

Failure Handling and Rollback Rules:
- if failover fails, revert to primary with rollback steps

Things Agents Must Not Do:
- do not modify primary without validation
- do not proceed without consensus
- no silent data loss

Recommended Agent Operating Model

PlannerAgent acts as the central orchestrator with clear decision boundaries and fallback criteria. ExecutorAgent implements promotions and reconfigurations with strict prechecks. MonitorAgent provides timely health signals. ValidationAgent guards post-failover integrity. StateSyncAgent maintains data replication and state coherence. AuditorAgent ensures every decision and outcome is auditable. Escalation paths are defined for anomalies that exceed thresholds, ensuring human review when needed.

Recommended Project Structure

ai-failover/
  planner/
  executor/
  monitor/
  validation/
  state/
  shared/
  tests/
  docs/

Core Operating Principles

Deterministic, rule-based decisions with traceable decisions
Clear separation of concerns between planning, execution, and validation
All actions are auditable with identity and timestamps
Use of explicit memory and sources of truth for all decisions
Do not perform unchecked production changes

Agent Handoff and Collaboration Rules

Planner -> Executor signals failover; Executor validates and executes
StateSyncAgent transfers latest state and ensures consistency
ValidationAgent confirms health post-failover and triggers on success
AuditorAgent logs decisions and outcomes for compliance

Tool Governance and Permission Rules

Only permitted tools: load balancer config, DNS updates, metrics APIs, config store
Secrets accessed via vault; never stored in code or logs
All tool actions require an approval gate when beyond safe thresholds

Code Construction Rules

Idempotent actions with clear reconciliation checks
All changes are versioned and reversible
No hard-coded secrets; fetch from secure vault

Security and Production Rules

Role-based access control for all agents
Encrypted data in transit and at rest
Audit trails and tamper-evident logs

Testing Checklist

Unit tests for each agent and its boundaries
Integration tests for end-to-end failover path
Failover in staging with simulated outages
Rollback verification and revert path tests

Common Mistakes to Avoid

Skipping validation before promotion
Unclear memory sources or stale state
Uncontrolled production changes without gates

FAQ

What is the purpose of this AGENTS.md Template for active-passive failover?

It codifies the operating model for an AI coding agent workflow that detects primary failure, promotes a standby, validates health, and handles handoffs with clear memory, sources of truth, and escalation gates.

Who are the primary agents in this template and what are their roles?

PlannerAgent orchestrates, ExecutorAgent performs promotion and reconfiguration, MonitorAgent checks health, ValidationAgent validates failover, StateSyncAgent ensures data replication, AuditorAgent records events.

How are handoffs between agents governed?

Handoffs follow a planned sequence with explicit signals from Planner to Executor, state transfer by StateSyncAgent, and validation by ValidationAgent before declaring success.

What are the security and governance rules for tool access?

Tool access is gate-checked, secrets are retrieved from a secure vault, and all changes go through approval gates and audit trails.

What tests are required before production failover?

Unit and integration tests of each agent, end-to-end failover simulations, validation checks, and rollback viability in staging before production.

Where can I find the project structure and template block?

The project structure is defined in the Recommended Project Structure section and the Copyable AGENTS.md Template block is included in the content field of this page.

Target User

Use Cases