AGENTS.md Template: Retry and Backoff Strategy
AGENTS.md Template for a retry and backoff strategy to govern AI coding agents in single and multi-agent orchestration, with explicit rules, memory, and tool governance.
Target User
Developers, engineering leaders, product teams
Use Cases
- Retry and backoff orchestration for AI coding agents
- Multi-agent planning and execution with backoff
- Handoff between planner, implementer, reviewer in retry scenarios
Markdown Template
AGENTS.md Template: Retry and Backoff Strategy
# AGENTS.md
Project role: Retry and Backoff Orchestrator
Agent roster and responsibilities:
- Planner: designs retry strategy, sets initial backoff, max retries, and backoff factor
- Implementer: executes tasks, applies backoff, and records outcomes
- Tester: validates retry results against success criteria
- Researcher: analyzes transient failures and provides mitigations
- Domain Specialist: assists with domain-specific retry rules and idempotency
Supervisor or orchestrator behavior:
- Monitors task progress and enforces the global retry policy
- Applies backoff with jitter to prevent thundering herd effects
- Updates shared state in a single source of truth
- Escalates to human review when max retries are exceeded or failures are non-recoverable
Handoff rules between agents:
- Planner to Implementer: pass plan details, backoff policy, and initial context
- Implementer to Planner: report failure type and suggested adjustment to backoff
- Implementer to Reviewer: when non-recoverable error is detected
- Researcher to Domain Specialist: provide domain-specific mitigations when needed
Context, memory, and source-of-truth rules:
- Maintain a central memory store for task state, retry counts, last error, and timestamps
- Use a single source of truth for task context and decisions
- Log all decisions with traceable IDs for audit
Tool access and permission rules:
- Access based on role; avoid elevated permissions without approval
- Secrets must be retrieved from a vault; never printed or echoed
- API keys rotated per run; use scoped tokens per task
Architecture rules:
- Modular components: planner, executor, monitor, and a shared state store
- Tasks are idempotent; repeated retries should not corrupt state
- Clear separation between planning, execution, and validation
File structure rules:
- AGENTS.md sits at the project root
- Directories:
/planning
/execution
/validation
/configs
/logs
/tests
Data, API, or integration rules:
- Use idempotent endpoints; respect rate limits and timeouts
- Validate inputs and sanitize outputs before state updates
- Do not create or modify data outside the designated memory and store
Validation rules:
- Retry outcomes must meet defined success criteria
- Ensure backoff does not exceed maxBackoff and total retry window
- Validate that each retry path remains deterministic
Security rules:
- Do not log secrets or sensitive payloads
- Use secrets vaults and access controls
- Ensure network calls respect least privilege
Testing rules:
- Unit tests for backoff calculations and counters
- Integration tests for planner-executor handoffs
- End-to-end tests with simulated transient failures
Deployment rules:
- Deploy with a feature flag and canary run
- Rollback plan if a critical retry path regresses
Human review and escalation rules:
- Escalate after 3 consecutive failures or irrecoverable errors
- Provide concise failure summary and suggested remediation
Failure handling and rollback rules:
- On irrecoverable failure, stop retries and revert to last stable state
- If a retry path introduces new risk, rollback to previous policy
Things Agents must not do:
- Do not bypass the global retry policy
- Do not modify shared memory without logging
- Do not execute destructive actions; avoid data loss on retriesOverview
Direct answer: This AGENTS.md Template defines a retry and backoff workflow for AI coding agents, enabling both single-agent retry strategies and multi-agent orchestration with clear handoffs, memory, and governance.
This template provides a concrete operating context to govern how retries are planned, executed, observed, and escalated within a broader multi-agent system. It covers roles, provenance, tool access, and governance needed to avoid context drift while ensuring measurable reliability in demanding automation tasks.
When to Use This AGENTS.md Template
- You need a repeatable retry pattern for transient failures in AI coding tasks.
- You are coordinating multiple agents where one agent's success depends on another’s retry loop or backoff schedule.
- You require explicit handoffs, source of truth, and auditability for retry decisions.
- You want clear guardrails for security, data handling, and deployment in retry scenarios.
Copyable AGENTS.md Template
# AGENTS.md
Project role: Retry and Backoff Orchestrator
Agent roster and responsibilities:
- Planner: designs retry strategy, sets initial backoff, max retries, and backoff factor
- Implementer: executes tasks, applies backoff, and records outcomes
- Tester: validates retry results against success criteria
- Researcher: analyzes transient failures and provides mitigations
- Domain Specialist: assists with domain-specific retry rules and idempotency
Supervisor or orchestrator behavior:
- Monitors task progress and enforces the global retry policy
- Applies backoff with jitter to prevent thundering herd effects
- Updates shared state in a single source of truth
- Escalates to human review when max retries are exceeded or failures are non-recoverable
Handoff rules between agents:
- Planner to Implementer: pass plan details, backoff policy, and initial context
- Implementer to Planner: report failure type and suggested adjustment to backoff
- Implementer to Reviewer: when non-recoverable error is detected
- Researcher to Domain Specialist: provide domain-specific mitigations when needed
Context, memory, and source-of-truth rules:
- Maintain a central memory store for task state, retry counts, last error, and timestamps
- Use a single source of truth for task context and decisions
- Log all decisions with traceable IDs for audit
Tool access and permission rules:
- Access based on role; avoid elevated permissions without approval
- Secrets must be retrieved from a vault; never printed or echoed
- API keys rotated per run; use scoped tokens per task
Architecture rules:
- Modular components: planner, executor, monitor, and a shared state store
- Tasks are idempotent; repeated retries should not corrupt state
- Clear separation between planning, execution, and validation
File structure rules:
- AGENTS.md sits at the project root
- Directories:
/planning
/execution
/validation
/configs
/logs
/tests
Data, API, or integration rules:
- Use idempotent endpoints; respect rate limits and timeouts
- Validate inputs and sanitize outputs before state updates
- Do not create or modify data outside the designated memory and store
Validation rules:
- Retry outcomes must meet defined success criteria
- Ensure backoff does not exceed maxBackoff and total retry window
- Validate that each retry path remains deterministic
Security rules:
- Do not log secrets or sensitive payloads
- Use secrets vaults and access controls
- Ensure network calls respect least privilege
Testing rules:
- Unit tests for backoff calculations and counters
- Integration tests for planner-executor handoffs
- End-to-end tests with simulated transient failures
Deployment rules:
- Deploy with a feature flag and canary run
- Rollback plan if a critical retry path regresses
Human review and escalation rules:
- Escalate after 3 consecutive failures or irrecoverable errors
- Provide concise failure summary and suggested remediation
Failure handling and rollback rules:
- On irrecoverable failure, stop retries and revert to last stable state
- If a retry path introduces new risk, rollback to previous policy
Things Agents must not do:
- Do not bypass the global retry policy
- Do not modify shared memory without logging
- Do not execute destructive actions; avoid data loss on retries
Recommended Agent Operating Model
The operator model defines clear boundaries among planner, implementer, tester, researcher, and domain specialist. Decision boundaries prevent drift between planning and execution. Escalation routes ensure timely human review when required.
Recommended Project Structure
retry-backoff-workflow/
agents/
planner/
implementer/
tester/
researcher/
domain-specialist/
configs/
logs/
tests/
AGENTS.md
Core Operating Principles
- Enforce a deterministic backoff policy with jitter to avoid contention.
- Center all decisions in a single source of truth store.
- Handoffs must include complete context; do not pass partial state.
- All tool access is restricted by role and audited.
- Backoffs, retries, and results are idempotent and replay-safe.
Agent Handoff and Collaboration Rules
- Planner provides a complete plan with retry budget and backoff schedule to Implementer.
- Implementer reports progress and any failure type back to Planner and Monitor.
- Reviewer validates successful retries and authenticates improvements before production use.
- Researcher and Domain Specialist provide mitigations for domain-specific transient failures.
Tool Governance and Permission Rules
- Commands and API calls must be approved by the orchestrator when outside baseline plan.
- All edits to files and configurations require audit trail entries.
- Secrets and credentials are never printed; accessed via protected vaults.
- Production systems require approval gates and feature flags for retry logic changes.
Code Construction Rules
- Implement backoff calculations as pure functions with predictable outputs.
- Ensure all network calls are idempotent and retry-safe.
- Avoid race conditions by persisting retry state before performing actions.
- Document retry outcomes and edge cases in the memory store.
Security and Production Rules
- Limit outbound connections to permitted services only.
- Encrypt sensitive payloads in transit and at rest where applicable.
- Monitor retries for anomalous patterns and trigger alarms if thresholds are breached.
Testing Checklist
- Unit tests for backoff calculations and max retry logic.
- Integration tests for planner-executor handoffs and state persistence.
- End-to-end tests simulating transient failures and escalation paths.
- Security tests ensuring secrets are never leaked and access controls work.
Common Mistakes to Avoid
- Overly aggressive backoff settings that cause sunk retries.
- Missing context in handoffs leading to misaligned retry decisions.
- Ignoring idempotency in retry paths causing data drift.
- Exposing secrets in logs or outputs during retries.
Related implementation resources: AI Use Case for Sales Pipeline Reviews and Deal Risk Scoring and AI Use Case for Policy Documents and Internal Question Answering.
FAQ
What is AGENTS.md Template?
The AGENTS.md Template provides a structured operating manual for retry and backoff workflows across AI coding agents, enabling predictable handoffs, tool governance, and human review when needed.
How do I apply backoff with jitter in multi-agent retries?
Use a backoff policy with initialBackoff, maxBackoff, and a jitter factor; coordinate with the orchestrator to prevent simultaneous retries across agents.
Who monitors the retry workflow?
The orchestrator monitors progress, enforces policy, handles escalations, and records outcomes in a central memory store.
What happens on non-recoverable failures?
The system escalates to human review and may rollback to a stable state, stopping further retries for that path.
How are security and secrets handled during retries?
Secrets are retrieved from a vault, never logged or printed, and access is restricted by role-based controls.