Dead Letter Queue AGENTS.md Template for AI Coding Agents | Suhas Bhairav

Overview

This AGENTS.md Template provides a complete Dead Letter Queue (DLQ) strategy for AI coding agents. It governs queueing, retry policies, multi-agent orchestration, and escalation paths. It is designed for both single-agent and multi-agent workflows, enabling clear handoffs, auditability, and safe failure handling. This is an AGENTS.md template intended to be copied into a project as the primary operating context for DLQ-driven AI workflows.

Direct answer: The DLQ strategy is a structured, copyable operating manual that defines roles, handoffs, memory, and governance to keep AI coding agents aligned and auditable when errors occur.

When to Use This AGENTS.md Template

When your AI coding agents process streams and may produce unprocessable messages.
When you need explicit agent handoffs, backoff policies, and escalation rules for DLQ flows.
When you require a single source of truth for DLQ decisions and auditability for compliance.

Copyable AGENTS.md Template

# AGENTS.md

Project role: Dead Letter Queue orchestrator for AI coding agents in a DLQ strategy

Agent roster and responsibilities:
- ReceiverAgent: Ingest messages from the main stream and route to DLQ on failure.
- DLQProcessor: De-queues DLQ messages, applies backoff, reprocesses, and escalates non-recoverable items.
- EnrichmentAgent: Adds context data to messages before reprocessing.
- ValidationAgent: Validates payload against the target schema before reprocessing.
- AuditorAgent: Logs outcomes, metrics, and policy violations.
- HumanReviewAgent: Escalates items requiring manual intervention or regulatory review.

Supervisor or orchestrator behavior:
- DLQOrchestrator coordinates all agents, enforces retry backoff policies, and ensures idempotent processing.
- It enforces tool governance rules, audit trails, and escalation gates.

Handoff rules between agents:
- If ReceiverAgent fails to process, route to DLQ with reason and metadata.
- DLQProcessor attempts reprocessing; on success, returns to main stream; on non recoverable failure, sends to HumanReviewAgent.
- EnrichmentAgent and ValidationAgent run before reprocessing to guarantee context and schema conformance.
- HumanReviewAgent handles manual decisions and backfills status to system state.

Context, memory, and source-of-truth rules:
- Memory stored in a centralized state store with per-message run-id and source-of-truth references.
- All decisions are traceable via a unique correlation-id; never mutate original payload.

Tool access and permission rules:
- Agents may read queues, write to DLQ, invoke reprocessing endpoints, and update the audit log.
- Secrets must be retrieved from a Secrets Manager; do not store in logs or code.

Architecture rules:
- Event-driven microservices; idempotent replays; deterministic outputs; strict backoff and circuit breaking.

File structure rules:
- Use a small, predictable directory tree (see Recommended Project Structure).

Data, API, or integration rules when relevant:
- Use orchestrated REST/HTTP APIs; support idempotent retries; respect API rate limits.

Validation rules:
- All reprocessed payloads must pass ValidationAgent; rejected payload must escalate to HumanReview.

Security rules:
- Encrypt data in transit and at rest; restrict DLQ access by role; rotate credentials.

Testing rules:
- Unit tests for each agent, integration tests for the DLQ flow, and end-to-end tests with mock streams.

Deployment rules:
- Deploy with feature flags; monitor DLQ backlog; rollback on policy violations.

Human review and escalation rules:
- Escalate to HumanReviewAgent if retries exceed threshold or regulatory flag is raised.

Failure handling and rollback rules:
- On unrecoverable failure, shift to manual review; preserve original payload in DLQ before rollback.

Things Agents must not do:
- Do not bypass validation; do not mutate the original message; do not delete messages without resolution or approval.

Recommended Agent Operating Model

The DLQ workflow relies on a clear operating model where each agent has defined responsibilities and decision boundaries. The Orchestrator enforces policy and coordinates handoffs; individual agents perform specialized tasks with deterministic outputs. Escalations are explicit and auditable, ensuring human review when needed.

Recommended Project Structure

dlq-agents/
├─ orchestrator/
├─ receiver/
├─ dlq_processor/
├─ enrichment/
├─ validation/
├─ auditor/
├─ human_review/
├─ config/
└─ tests/

Core Operating Principles

Idempotent processing in all reprocessing steps.
Strict backoff and retry limits with clear timeouts.
Full traceability via correlation IDs and centralized audit logs.
Strict tool governance and least-privilege access.
Clear, documented handoffs between agents.

Agent Handoff and Collaboration Rules

Planner orchestrator plans the sequence and backoff; Implementer executes per step.
Rollover to the next agent only after successful validation and enrichment.
Validation must run before any reprocessing attempts; failures go to HumanReview.
Researchers or domain specialists provide context when schema mismatches occur.
All handoffs are recorded with timestamps and outcomes in the audit log.

Tool Governance and Permission Rules

Only DLQ, queue, and processing endpoints may be invoked by agents.
Secrets accessed through a secure vault; never hard-coded.
All production actions require approval gates for critical changes.
All API calls are rate-limited and instrumented for retries.

Code Construction Rules

Use deterministic reprocessing functions with clean inputs.
Attach correlation IDs to all messages and log all state transitions.
Do not mutate the original payload; create immutable derived records for processing.
Use explicit backoff policies and cap maximum retries.
Validate at every boundary (ingest, enrich, pre-reprocess).

Security and Production Rules

DLQ access restricted by role; all actions auditable.
Data encryption at rest and in transit; credentials rotated regularly.
Canary rollouts for changes to the DLQ workflow.
Incident response playbooks tied to DLQ backlog thresholds.

Testing Checklist

Unit tests for each agent function.
Integration tests for DLQ flow including backoff, enrichment, and validation.
End-to-end tests with synthetic DLQ items and failure scenarios.
Performance tests under simulated backlog growth.

Common Mistakes to Avoid

Relying on weak retries without backoff logic.
Bypassing validation or enrichment steps.
Mutating the original payload; creating non-idempotent reprocessing paths.
Ignoring audit trails and escalation rules.

FAQ

What is a dead-letter queue (DLQ) in this template?

A DLQ is a holding area for messages that could not be processed after defined retries. It enables safe investigation, retry planning, and escalation without data loss.

Who should use this AGENTS.md Template for DLQ strategy?

Platform engineers, SREs, and AI coding agents teams who design error handling, retries, and handoffs in event-driven workflows.

How are agent handoffs managed?

Handoffs are explicit and sequenced: ReceiverAgent v DLQProcessor v Enrichment/Validation v Reprocessing; escalation to HumanReview when needed.

What happens if a DLQ item cannot be recovered?

If recovery fails after allowed retries, the item is escalated to HumanReview and logged for audit; data remains in DLQ with context.

How is security enforced in the DLQ workflow?

Access is role-based, secrets are managed in a vault, and all actions are auditable with traceable IDs.

Dead Letter Queue AGENTS.md Template

Target User

Use Cases