Backup and Restore AGENTS.md Template for AI Coding Agents

Overview

Direct answer: This AGENTS.md Template provides a project level operating manual for a backup and restore strategy in AI coding agents, enabling safe single agent and multi agent orchestration with versioned snapshots and clear handoffs.

Purpose: govern how agent state, memory, data sources, and tool access are backed up, restored, and validated across an orchestration pattern that may involve planner, implementer, tester, reviewer, and domain specialists. It establishes a repeatable, auditable flow that reduces context drift and supports rapid recovery after failures.

When to Use This AGENTS.md Template

When you design a backup and restore workflow for AI coding agents that must recover from memory loss, data corruption, or tool outages.
When coordinating single agents and multi agent orchestrations that require a known good state and validated rollbacks.
When you need versioned backups, explicit rollback points, and auditable recovery traces.
When enforcing tool governance, secrets handling, and secure memory stores.
When you require a clear, copyable instruction set for new team members or contractors.

Copyable AGENTS.md Template

# AGENTS.md

Project role
- Backup Architect
- Restore Engineer
- Recovery Supervisor
- Auditor
- Domain Specialist

Agent roster and responsibilities
- Backup Architect: defines backup cadence, memory boundaries, and data sources to snapshot
- Restore Engineer: performs restore steps in a controlled, idempotent manner
- Recovery Supervisor: approves and validates restoration plans and handoffs
- Auditor: verifies integrity of backups and rollback points
- Domain Specialist: ensures domain specific constraints are honored during recoveries

Supervisor or orchestrator behavior
- The Recovery Supervisor coordinates plan approval, data source synchronization, and cross agent handoffs
- All restore actions require supervisor sign off before execution

Handoff rules between agents
- Backup completion triggers Restore Engineer plan
- Restore Engineer must hand off to Auditor for integrity check
- Auditor hands back to Recovery Supervisor for final approval
- After approval, implement and validate in staging before production

Context, memory, and source-of-truth rules
- All agent memory and state snapshots are stored in a versioned state store
- Source of truth (SoT) is the canonical memory store and configuration store
- Agents must read from SoT and write to versioned backups only

Tool access and permission rules
- Access to memory store, backups, and versioning tools is role scoped
- Secrets must be retrieved from a vault and never hard coded
- No agent may perform production changes without supervisor approval

Architecture rules
- Use event driven triggers for backup and restore tasks
- All steps are idempotent and replayable
- Logs and traces maintained for every action

File structure rules
- backups/: immutable snapshots with metadata.json
- restores/: restore plans and scripts per agent
- logs/: operation logs
- docs/: operation guidance and runbooks

Data, API, or integration rules when relevant
- Backups include memory, configs, and critical state data
- Use versioned snapshots with timestamped metadata
- Restore flows interact with external services only through approved APIs

Validation rules
- Post restore validation includes integrity checks, end-to-end tests, and domain rule checks
- All validations must pass before production switch

Security rules
- Encrypt backups at rest and in transit
- Rotate credentials and limit access per role
- Audit all access to backups and reduces data exposure

Testing rules
- Unit tests for each backup/restore script
- Integration tests across backup/restore workflow
- Smoke tests after deployment to staging

Deployment rules
- Roll out in small batches with feature flags and monitoring
- Require test pass before prod

Human review and escalation rules
- Any non trivial restore requiring production impact must be escalated to a human reviewer
- If backups fail, revert to last good snapshot and notify team

Failure handling and rollback rules
- On failure, rollback to previous known good snapshot
- All operations are logged and auditable

Things Agents must not do
- Do not bypass SoT or write backups to non versioned stores
- Do not modify production data without approval
- Do not run destructive actions without a rollback point

Recommended Agent Operating Model

The model defines the responsibilities, decision boundaries, and escalation paths for backup and restore of AI coding agents. Roles include Backup Architect, Restore Engineer, Recovery Supervisor, Auditor, and Domain Specialist. Each role has authority boundaries, defined inputs, and expected outputs. Handoffs are explicit events with validation gates. Escalations route to the Recovery Supervisor and, if necessary, to production SRE or security leads.

Recommended Project Structure

Workflow specific directory tree

workflows/backup_restore/
  policies/              # governance and retention rules
  backups/               # immutable data snapshots and meta
  restores/              # restore plans and scripts
  agents/                # agent roles and responsibilities
    planner/             # backup planner scripts and prompts
    implementer/         # restore implementation scripts
    tester/              # tests and validation harnesses
    reviewer/            # validation and approval prompts
    auditor/             # integrity and compliance checks
  tests/                 # unit/integration tests
  docs/                  # runbooks and docs

Core Operating Principles

Single source of truth for memory and config
Idempotent backup and restore steps
Explicit handoffs with validation gates
Role based access and secrets management
Auditable change history and rollback capability
Separation of concerns between backup and restore tasks
Secure by default with encryption and least privilege

Agent Handoff and Collaboration Rules

Planner to Implementer: ensure snapshot scope and SoT alignment; handoff requires metadata validation. Implementer to Auditor: provide backup/restore logs and checkpoints. Auditor to Supervisor: request sign-off after validation. Domain Specialist involvement when rules or constraints are domain specific.

Tool Governance and Permission Rules

Commands and edits are restricted to approved tools. Secrets retrieval uses vault tokens with short lifetimes. Production changes require Recovery Supervisor approval. All tool usage is logged and replayable for audit.

Code Construction Rules

All scripts must be idempotent, auditable, and testable. Use explicit rollback points and versioned backups. Do not hard code credentials. Validate input schemas before execution.

Security and Production Rules

Backups are encrypted at rest and in transit. Access is role restricted. Production restore must be approved by Recovery Supervisor and validated in staging first. Monitor for anomalous activity during restore.

Testing Checklist

Unit tests for backup/restore scripts
Integration tests across snapshot, storage, and replay
End-to-end test in staging environment
Security and access control tests
Rollback path verification

Common Mistakes to Avoid

Skipping versioned backups
Undocumented rollback points
Bypassing SoT during restore
Overly broad access to backups or secrets
Rolling out changes without staging validation

FAQ

How does this AGENTS.md Template enforce versioned backups?

It specifies immutable snapshots with timestamps and a rollback point for each agent, ensuring deterministic recovery.

Who must approve production restore changes?

The Recovery Supervisor must approve, and staging validation is required before prod execution.

How are secrets managed during restore?

Secrets are retrieved from a vault using ephemeral tokens; do not store secrets in backups or logs.

What happens on restore failure?

Failures trigger a rollback to the last good snapshot, with an escalation path to human review and notification of stakeholders.

What outputs should a restored state produce?

A validated restored memory/config state with updated health checks and inventory, ready for go/no go decision.

Target User

Use Cases