AGENTS.md Template - Distributed Tracing Architecture | AGENTS.md template
AGENTS.md template for distributed tracing architecture that governs multi-agent orchestration, trace handoffs, and governance.
Target User
Developers, founders, product teams, engineering leaders
Use Cases
- Distributed tracing architecture
- multi-agent orchestration for tracing
- trace context propagation
- tool governance in tracing
Markdown Template
AGENTS.md Template - Distributed Tracing Architecture | AGENTS.md template
# AGENTS.md
Project role: Distributed Tracing Orchestrator and governance layer for AI coding agents
Agent roster and responsibilities:
- Planner: defines tracing objectives, sampling strategy, and trace context propagation rules.
- TraceCollector: ingests spans from instrumented services, normalizes data, and forwards to storage.
- Propagator: ensures trace context is correctly propagated across service boundaries.
- Sampler: decides per-trace sampling rates and applies sampling rules.
- StorageIndexer: persists traces in a time-series trace store and builds indices for queries.
- Analyzer: runs queries, detects anomalies, and surfaces insights for human review.
- Auditor: monitors conformance, security, and policy compliance.
Supervisor or orchestrator behavior:
- The Orchestrator coordinates agent work, enforces memory and source-of-truth rules, and triggers handoffs.
- All decisions are logged, auditable, and revertible.
Handoff rules between agents:
- Planner -> TraceCollector: Plan and provide expected trace sources, sampling config, and enabled exporters.
- TraceCollector -> Propagator: Ensure the trace context is registered and forwarded to downstream services.
- Propagator -> Sampler: Apply sampling decisions when a new trace is started.
- Sampler -> StorageIndexer: Persist selected traces with metadata.
- StorageIndexer -> Analyzer: Provide indexed traces for analytic runs; Analyzer may request re-sampling rules for re-evaluation.
Context, memory, and source-of-truth rules:
- Context: All traces carry a TraceId, SpanId, and baggage with bounded size; memory should store only current, active traces in-flight and a cached policy set.
- Source-of-truth: Instrumentation config and tracing backend configurations are the canonical truth; never override them without supervisor approval.
Tool access and permission rules:
- Tools: OpenTelemetry SDKs, exporters, and a normalized trace store API; access restricted by role and environment.
- Secrets: Use a secrets vault; never hard-code credentials; limit leakage risk with short-lived credentials.
Architecture rules:
- Use OpenTelemetry for instrumentation; export to a compliant backend; guarantee end-to-end traceability.
File structure rules:
- Do not add unrelated folders; keep tracing components under orchestrator, agents, configs, and data.
Data, API, or integration rules:
- All traces must include trace context propagation metadata; ensure compatibility with OTLP or vendor-compatible formats.
Validation rules:
- Validate trace continuity (TraceId sequence, SpanId relationships), sampling policy consistency, and exporter reachability.
Security rules:
- Enforce least privilege, mutual TLS, and audit logging for all tracing operations.
Testing rules:
- Unit tests for individual agents, integration tests for end-to-end trace flow, and performance tests for sampling rates.
Deployment rules:
- Deploy tracing components with versioned configs; ensure rollback points and health checks exist.
Human review and escalation rules:
- Escalate anomalies to the Reviewer role; require human approval for policy changes affecting sampling or storage.
Failure handling and rollback rules:
- If a component fails, isolate and revert to the last known good configuration; preserve trace data integrity.
Things Agents must not do:
- Do not bypass sampling, skip trace propagation, or modify core tracing semantics without supervisor approval.Overview
Direct answer: This AGENTS.md template provides a copyable operating manual for a distributed tracing architecture managed by AI coding agents. It defines the agent roster, handoffs, memory rules, and governance required to coordinate single-agent and multi-agent tracing workflows across services, instrumentation, and backends.
When to Use This AGENTS.md Template
- You are implementing a distributed tracing architecture with multiple services or agents.
- You need a clear, copyable operating context for trace collection, propagation, sampling, storage, and analytics.
- You require explicit agent handoff rules, source-of-truth, and governance to avoid context drift and unsafe changes.
- You are establishing tool access, secrets handling, and production safeguards for tracing components.
Copyable AGENTS.md Template
# AGENTS.md
Project role: Distributed Tracing Orchestrator and governance layer for AI coding agents
Agent roster and responsibilities:
- Planner: defines tracing objectives, sampling strategy, and trace context propagation rules.
- TraceCollector: ingests spans from instrumented services, normalizes data, and forwards to storage.
- Propagator: ensures trace context is correctly propagated across service boundaries.
- Sampler: decides per-trace sampling rates and applies sampling rules.
- StorageIndexer: persists traces in a time-series trace store and builds indices for queries.
- Analyzer: runs queries, detects anomalies, and surfaces insights for human review.
- Auditor: monitors conformance, security, and policy compliance.
Supervisor or orchestrator behavior:
- The Orchestrator coordinates agent work, enforces memory and source-of-truth rules, and triggers handoffs.
- All decisions are logged, auditable, and revertible.
Handoff rules between agents:
- Planner -> TraceCollector: Plan and provide expected trace sources, sampling config, and enabled exporters.
- TraceCollector -> Propagator: Ensure the trace context is registered and forwarded to downstream services.
- Propagator -> Sampler: Apply sampling decisions when a new trace is started.
- Sampler -> StorageIndexer: Persist selected traces with metadata.
- StorageIndexer -> Analyzer: Provide indexed traces for analytic runs; Analyzer may request re-sampling rules for re-evaluation.
Context, memory, and source-of-truth rules:
- Context: All traces carry a TraceId, SpanId, and baggage with bounded size; memory should store only current, active traces in-flight and a cached policy set.
- Source-of-truth: Instrumentation config and tracing backend configurations are the canonical truth; never override them without supervisor approval.
Tool access and permission rules:
- Tools: OpenTelemetry SDKs, exporters, and a normalized trace store API; access restricted by role and environment.
- Secrets: Use a secrets vault; never hard-code credentials; limit leakage risk with short-lived credentials.
Architecture rules:
- Use OpenTelemetry for instrumentation; export to a compliant backend; guarantee end-to-end traceability.
File structure rules:
- Do not add unrelated folders; keep tracing components under orchestrator, agents, configs, and data.
Data, API, or integration rules:
- All traces must include trace context propagation metadata; ensure compatibility with OTLP or vendor-compatible formats.
Validation rules:
- Validate trace continuity (TraceId sequence, SpanId relationships), sampling policy consistency, and exporter reachability.
Security rules:
- Enforce least privilege, mutual TLS, and audit logging for all tracing operations.
Testing rules:
- Unit tests for individual agents, integration tests for end-to-end trace flow, and performance tests for sampling rates.
Deployment rules:
- Deploy tracing components with versioned configs; ensure rollback points and health checks exist.
Human review and escalation rules:
- Escalate anomalies to the Reviewer role; require human approval for policy changes affecting sampling or storage.
Failure handling and rollback rules:
- If a component fails, isolate and revert to the last known good configuration; preserve trace data integrity.
Things Agents must not do:
- Do not bypass sampling, skip trace propagation, or modify core tracing semantics without supervisor approval.
Recommended Agent Operating Model
Agent roles, responsibilities, decision boundaries, and escalation paths are defined to keep tracing accurate and auditable while allowing automation to optimize data collection and analysis. The Planner has decision rights on tracing goals; the Auditor monitors policy conformance; the Reviewer handles human interventions when policies are changed.
Recommended Project Structure
distributed-tracing/
├── orchestrator/
│ ├── planner/
│ │ └── plan.json
│ └── policies/
├── agents/
│ ├── collector/
│ │ ├── collector.py
│ │ └── exporters/
│ ├── propagator/
│ │ ├── propagator.py
│ ├── sampler/
│ │ ├── sampler.py
│ ├── storage/
│ │ ├── indexer.py
│ │ └── schema/
│ ├── analyzer/
│ │ └── queries/
│ └── auditor/
├── configs/
│ └── tracing.yaml
├── data/
│ └── traces/
├── libs/
│ └── tracing-lib/
├── tests/
│ ├── unit/
│ └── integration/
└── docs/
└── AGENTS.md
Core Operating Principles
- Single source of truth: instrumentation and tracing backend configurations are canonical.
- Deterministic and idempotent agent actions to avoid state drift.
- Strict memory boundaries: traces in-flight and policy state, not full histories in memory.
- Clear ownership, auditable actions, and always-on logging.
- Least privilege: access rights are scoped to role and environment.
Agent Handoff and Collaboration Rules
- Planner: defines goals, sampling strategy, and propagation model; communicates with all downstream agents and the Orchestrator.
- TraceCollector: ingests spans, normalizes, and validates trace continuity before forwarding to Propagator.
- Propagator: ensures correct trace-context propagation across services and exporters.
- Sampler: applies per-trace or per-service sampling decisions and updates policy as needed.
- StorageIndexer: persists traces and builds indices; notifies Analyzer of new data windows.
- Analyzer: runs queries, detects anomalies, and surfaces insights for human review or policy updates.
- Auditor: monitors policy conformance, secrets access, and security events; raises escalations if violations occur.
Tool Governance and Permission Rules
- Command execution: only orchestrator-initiated commands.
- File edits: only through approved branches; no direct edits in production state.
- API calls: restricted to tracing backends with strict rate limits and authentication.
- Secrets: managed via vault; rotation policies enforced; no hard-coded secrets.
- Production systems: read-only access for most agents; write access restricted to storage and orchestrator when necessary.
- External services: only through approved exporters and adapters.
- Approval gates: policy changes or topology updates require Supervisor and Auditor sign-off.
Code Construction Rules
- Use OpenTelemetry and OTLP exporters; ensure compatibility with the target backend.
- Trace IDs must be preserved across services; avoid re-creating IDs.
- Keep agents stateless where possible; use idempotent operations and replay-safe actions.
- Exporters must be configurable via environment and versioned config files.
- All code must be peer-reviewed; tests must cover new behavior.
Security and Production Rules
- Mutual TLS between components; rotate credentials regularly.
- Audit logs for all tracing operations; monitor for anomalies.
- Environment isolation: dev/stage/prod separated; production-ready defaults.
- Backups of trace data and index metadata; disaster recovery plan.
Testing Checklist
- Unit tests for each agent's behavior.
- Integration tests for end-to-end trace flow across Planner, Collector, Propagator, Sampler, Storage, and Analyzer.
- Performance and load testing for sampling strategies and exporters.
- Deployment tests with health checks and rollback procedures.
Common Mistakes to Avoid
- Overly broad agent permissions leading to credential exposure.
- Unbounded memory usage by keeping complete trace histories in memory.
- Skipping validation of trace continuity when propagating across services.
- Unclear handoffs causing context drift or duplication of work.
Related implementation resources: AI Use Case for Corporate Event Managers Using Slack To Orchestrate Day-Of Venue Tasks Across Multi-Department Teams and AI Use Case for Xero Reports and Business Performance Insights.
FAQ
What is an AGENTS.md Template for distributed tracing architecture?
The AGENTS.md Template provides a copyable operating manual for planners, collectors, propagators, and analyzers working together to implement distributed tracing with clear handoffs and governance.
Who should use this template?
Platform teams, SREs, and developer productivity teams implementing multi-agent tracing workflows will benefit from its explicit roles and constraints.
How are agent handoffs governed in this template?
Handoffs are defined between Planner, TraceCollector, Propagator, Sampler, StorageIndexer, and Analyzer with explicit data, state, and memory expectations to avoid drift.
What safety checks are included for production tracing?
Security, access control, secret management, audit logging, and policy-driven approval gates guard production tracing components.
How do I customize the template for my backend?
Adapt exporters, data models, and configuration formats; keep canonical truth in the instrumentation config and tracing backend; update the planner policy accordingly.