AGENTS.md TemplatesAGENTS.md Template

AGENTS.md Template for Real-Time Analytics Pipelines

AGENTS.md Template for real-time analytics pipelines — define agent roles, handoffs, governance, memory, and tool access for streaming workflows.

AGENTS.md Templatereal-time analyticsAI coding agentsmulti-agent orchestrationagent handoffstool governancehuman reviewstreaming data pipelinesreal-time processinganalytics pipeline orchestrationdata governance

Target User

Developers, data engineers, platform teams, engineering leaders

Use Cases

  • Real-time analytics pipelines
  • Multi-agent orchestration in streaming workflows
  • Event-driven data processing with agent handoffs
  • Tool governance and human review in data pipelines

Markdown Template

AGENTS.md Template for Real-Time Analytics Pipelines

# AGENTS.md

Project: Real-Time Analytics Pipeline Orchestration

Agent roster:
- Orchestrator (Planner) — coordinates workflow steps, enforces handoffs, maintains pipeline state.
- IngestAgent — ingests streaming data from sources (e.g., Kafka/Kinesis), deduplicates, formats, and writes to staging.
- TransformAgent — applies streaming transformations, windowing, and enrichment on staged data.
- ValidateAgent — validates schema, data quality, and conformance to the catalog.
- AnomalyDetectAgent — monitors and flags anomalies, thresholds, and drift; can escalate.
- DeliverAgent — writes validated data to production sinks and dashboards.
- Reviewer — performs human validation when needed.
- DomainExpert — provides domain-specific judgment for edge cases or regulatory concerns.

Supervisor or orchestrator behavior:
- The Orchestrator maintains a live pipeline_state (batch_id, window, timestamps) and orchestrates handoffs between agents.
- It enforces idempotency, source-of-truth consistency, and requires approval for production changes.
- It surfaces anomalies to the Reviewer or DomainExpert when drift or failures exceed thresholds.

Handoff rules between agents:
- IngestAgent → TransformAgent when a batch/window of events lands in staging with a valid ingestion marker.
- TransformAgent → ValidateAgent after transforms complete and results are stored with a transformation_id.
- ValidateAgent → DeliverAgent when schema and data quality checks pass.
- AnomalyDetectAgent → Reviewer or DomainExpert when anomaly severity is high; Otherwise logs and continues.
- Reviewer → Orchestrator when review is complete; Orchestrator → DeliverAgent to deploy to production.
- DomainExpert → Orchestrator for high-risk domain decisions.

Context, memory, and source-of-truth rules:
- All decisions must reference the central Data Catalog as the source of truth for schemas, lineage, and quality rules.
- Memory persists to a workspace.json in the repository or a memory store with a bounded TTL; avoid leaking PII in memory.
- Operations rely on a single canonical stream named prod_events for production data and staging_events for in-flight data.

Tool access and permission rules:
- IngestAgent: read streaming sources; write to staging; no write to production sinks or secrets.
- TransformAgent: read staging; write to transformed staging; no direct production writes.
- ValidateAgent: read transformed data; write to validated layer; access to schema registry allowed.
- AnomalyDetectAgent: read validated data; write alerts to monitoring and, if needed, to Reviewer.
- DeliverAgent: read validated data; write to production sinks; can read from validated layer only.
- Orchestrator: manage credentials and secrets access; require approval for production changes.
- All agents must use secret management and rotate credentials per policy.

Architecture rules:
- Event-driven microservices with a central orchestrator.
- Idempotent writes and exactly-once handling where feasible, with at-least-once semantics for ingestion if exactly-once is not possible.
- Checkpointing and watermarking to manage late data and out-of-order events.
- Clear separation between staging, transforms, validated, and production layers.

File structure rules:
- Keep code under: real_time_analytics/
  - agents/
    - orchestrator/
    - ingest_agent/
    - transform_agent/
    - validate_agent/
    - anomaly_detector_agent/
    - deliver_agent/
    - reviewer_agent/
    - domain_expert_agent/
  - pipelines/
    - realtime/
      - sources/
      - transforms/
      - sinks/
  - data/
    - schemas/
  - configs/
  - tests/
  - docs/

Data, API, or integration rules:
- Use streaming sources (Kafka/Kinesis) with topic naming conventions.
- Define REST or gRPC endpoints with strict rate limits and authentication.
- Secrets stored in a vault; never hardcode credentials.
- Maintain a read-only production API surface for agents where possible.

Validation rules:
- Schema conformance, nullability checks, and boundary checks per schema catalog.
- Windowing and watermark correctness; ensure no data loss or duplication beyond defined tolerances.
- End-to-end traceability from ingestion to production sink.

Security rules:
- Encrypt data in transit and at rest.
- Principle of least privilege for all agents.
- Secrets rotation and auditable access.

Testing rules:
- Unit tests for each agent's transformation logic.
- Integration tests for end-to-end streaming pipeline with a synthetic dataset.
- Performance tests to validate backpressure handling and latency targets.

Deployment rules:
- Use canary deployments for pipeline changes; require monitoring thresholds.
- Feature flags to enable/disable agents or new transforms.

Human review and escalation rules:
- Trigger manual review when anomaly_score > threshold or data drift is detected.
- Escalate to DomainExpert for domain-critical decisions or regulatory concerns.

Failure handling and rollback rules:
- Implement exponential backoff retries and circuit breakers on flaky downstream services.
- Keep a last_good_snapshot as rollback point; revert to staging to reprocess if needed.

Things Agents must not do:
- Do not bypass schema validation or source-of-truth checks.
- Do not modify production data without explicit approval.
- Do not share secrets or credentials in logs or memory beyond allowed scope.
- Do not stay in a slide-state without progressing actions unless explicitly paused by Orchestrator.

Overview

The AGENTS.md template for Real-Time Analytics Pipelines defines a formal operating context to govern AI coding agents in streaming workflows. It supports both single-agent execution and multi-agent orchestration, including ingestion, transformation, validation, anomaly detection, and delivery. It prescribes memory, source-of-truth, tool governance, and escalation rules so teams can coordinate actions across the end-to-end pipeline.

Direct answer: Use this AGENTS.md Template to establish roles, handoffs, and governance for real-time analytics pipelines driven by AI coding agents, ensuring repeatable orchestration and auditable decision paths.

When to Use This AGENTS.md Template

  • When building or evolving a real-time analytics pipeline that uses AI agents to ingest, transform, validate, analyze, and deliver streaming data.
  • When you need a canonical operating manual that defines agent roster, responsibilities, and handoff rules for multi-agent orchestration.
  • When enforcing tool governance, authentication, and memory/source-of-truth discipline across the pipeline.
  • When you require explicit escalation, human review, and rollback paths for failures or anomalies in streaming workloads.

Copyable AGENTS.md Template

# AGENTS.md

Project: Real-Time Analytics Pipeline Orchestration

Agent roster:
- Orchestrator (Planner) — coordinates workflow steps, enforces handoffs, maintains pipeline state.
- IngestAgent — ingests streaming data from sources (e.g., Kafka/Kinesis), deduplicates, formats, and writes to staging.
- TransformAgent — applies streaming transformations, windowing, and enrichment on staged data.
- ValidateAgent — validates schema, data quality, and conformance to the catalog.
- AnomalyDetectAgent — monitors and flags anomalies, thresholds, and drift; can escalate.
- DeliverAgent — writes validated data to production sinks and dashboards.
- Reviewer — performs human validation when needed.
- DomainExpert — provides domain-specific judgment for edge cases or regulatory concerns.

Supervisor or orchestrator behavior:
- The Orchestrator maintains a live pipeline_state (batch_id, window, timestamps) and orchestrates handoffs between agents.
- It enforces idempotency, source-of-truth consistency, and requires approval for production changes.
- It surfaces anomalies to the Reviewer or DomainExpert when drift or failures exceed thresholds.

Handoff rules between agents:
- IngestAgent → TransformAgent when a batch/window of events lands in staging with a valid ingestion marker.
- TransformAgent → ValidateAgent after transforms complete and results are stored with a transformation_id.
- ValidateAgent → DeliverAgent when schema and data quality checks pass.
- AnomalyDetectAgent → Reviewer or DomainExpert when anomaly severity is high; Otherwise logs and continues.
- Reviewer → Orchestrator when review is complete; Orchestrator → DeliverAgent to deploy to production.
- DomainExpert → Orchestrator for high-risk domain decisions.

Context, memory, and source-of-truth rules:
- All decisions must reference the central Data Catalog as the source of truth for schemas, lineage, and quality rules.
- Memory persists to a workspace.json in the repository or a memory store with a bounded TTL; avoid leaking PII in memory.
- Operations rely on a single canonical stream named prod_events for production data and staging_events for in-flight data.

Tool access and permission rules:
- IngestAgent: read streaming sources; write to staging; no write to production sinks or secrets.
- TransformAgent: read staging; write to transformed staging; no direct production writes.
- ValidateAgent: read transformed data; write to validated layer; access to schema registry allowed.
- AnomalyDetectAgent: read validated data; write alerts to monitoring and, if needed, to Reviewer.
- DeliverAgent: read validated data; write to production sinks; can read from validated layer only.
- Orchestrator: manage credentials and secrets access; require approval for production changes.
- All agents must use secret management and rotate credentials per policy.

Architecture rules:
- Event-driven microservices with a central orchestrator.
- Idempotent writes and exactly-once handling where feasible, with at-least-once semantics for ingestion if exactly-once is not possible.
- Checkpointing and watermarking to manage late data and out-of-order events.
- Clear separation between staging, transforms, validated, and production layers.

File structure rules:
- Keep code under: real_time_analytics/
  - agents/
    - orchestrator/
    - ingest_agent/
    - transform_agent/
    - validate_agent/
    - anomaly_detector_agent/
    - deliver_agent/
    - reviewer_agent/
    - domain_expert_agent/
  - pipelines/
    - realtime/
      - sources/
      - transforms/
      - sinks/
  - data/
    - schemas/
  - configs/
  - tests/
  - docs/

Data, API, or integration rules:
- Use streaming sources (Kafka/Kinesis) with topic naming conventions.
- Define REST or gRPC endpoints with strict rate limits and authentication.
- Secrets stored in a vault; never hardcode credentials.
- Maintain a read-only production API surface for agents where possible.

Validation rules:
- Schema conformance, nullability checks, and boundary checks per schema catalog.
- Windowing and watermark correctness; ensure no data loss or duplication beyond defined tolerances.
- End-to-end traceability from ingestion to production sink.

Security rules:
- Encrypt data in transit and at rest.
- Principle of least privilege for all agents.
- Secrets rotation and auditable access.

Testing rules:
- Unit tests for each agent's transformation logic.
- Integration tests for end-to-end streaming pipeline with a synthetic dataset.
- Performance tests to validate backpressure handling and latency targets.

Deployment rules:
- Use canary deployments for pipeline changes; require monitoring thresholds.
- Feature flags to enable/disable agents or new transforms.

Human review and escalation rules:
- Trigger manual review when anomaly_score > threshold or data drift is detected.
- Escalate to DomainExpert for domain-critical decisions or regulatory concerns.

Failure handling and rollback rules:
- Implement exponential backoff retries and circuit breakers on flaky downstream services.
- Keep a last_good_snapshot as rollback point; revert to staging to reprocess if needed.

Things Agents must not do:
- Do not bypass schema validation or source-of-truth checks.
- Do not modify production data without explicit approval.
- Do not share secrets or credentials in logs or memory beyond allowed scope.
- Do not stay in a slide-state without progressing actions unless explicitly paused by Orchestrator.

Recommended Agent Operating Model

Agent roles and decision boundaries are defined to balance autonomy with governance. The Orchestrator acts as the central planner, coordinating ingestion, transformation, validation, anomaly detection, and delivery. Each agent has a clear scope and exit criteria before handing off to the next stage. Escalation paths exist for anomalies, regulatory concerns, and strategy decisions that require human judgment.

Recommended Project Structure

real_time_analytics/
  ├─ agents/
  │  ├─ orchestrator/
  │  ├─ ingest_agent/
  │  ├─ transform_agent/
  │  ├─ validate_agent/
  │  ├─ anomaly_detector_agent/
  │  ├─ deliver_agent/
  │  ├─ reviewer_agent/
  │  └─ domain_expert_agent/
  ├─ pipelines/
  │  └─ realtime/
  │     ├─ sources/
  │     ├─ transforms/
  │     └─ sinks/
  ├─ data/
  │  └─ schemas/
  ├─ configs/
  ├─ tests/
  └─ docs/

Core Operating Principles

  • Single source of truth for schemas, lineage, and quality rules.
  • Deterministic, idempotent operations across all agents.
  • End-to-end traceability from ingestion to production sinks.
  • Least-privilege access and secrets management.
  • Clear escalation and human review where necessary.
  • Explicit memory and context propagation boundaries between agents.

Agent Handoff and Collaboration Rules

  • Planner/Orchestrator defines the plan, sequences, and escalation triggers for all handoffs.
  • IngestAgent collects and normalizes raw events; passes to TransformAgent when ingestion completes.
  • TransformAgent performs windowed processing and enrichment; passes to ValidateAgent upon completion.
  • ValidateAgent checks schema and quality; passes to DeliverAgent if valid, or to Reviewer if manual validation is required.
  • AnomalyDetectAgent monitors for drift and anomalies; escalates to DomainExpert or Reviewer when thresholds are breached.
  • DomainExpert resolves domain-specific concerns and optionally approves downstream delivery.

Tool Governance and Permission Rules

  • Commands and API calls must go through a central approval gate for production changes.
  • Secrets and credentials must be accessed via a Secret Manager and rotated on schedule.
  • Production writes require validated approvals and audit logging.
  • Only Orchestrator and DeliverAgent may write to production sinks; other agents write to staging/validated layers.
  • All tool usage is logged with user/session identifiers for traceability.

Code Construction Rules

  • Code for agents must be deterministic, idempotent, and free of hard-coded secrets.
  • Follow streaming best practices: backpressure handling, checkpointing, and windowing semantics.
  • Use well-defined interfaces between agents; avoid tight coupling.
  • Tests must cover edge cases for late events, out-of-order data, and schema drift.
  • Documentation and inline comments must explain decisions affecting the agent workflow.

Security and Production Rules

  • Encrypt data in flight and at rest; rotate keys regularly.
  • Enforce least-privilege across all agents and services.
  • Audit trails for all changes to production configurations and agents.
  • Separate staging and production environments with strict promotion controls.

Testing Checklist

  • Unit tests for each agent’s transformation and validation logic.
  • Integration tests simulating streaming data with end-to-end flow.
  • End-to-end tests for failure and rollback scenarios.
  • Performance tests verifying latency and backpressure behavior.

Common Mistakes to Avoid

  • Bypassing schema validation or source-of-truth checks.
  • Uncontrolled data drift without timely escalation.
  • Hard-coding production endpoints or secrets into agent code.
  • Ignoring idempotency or proper watermarking in streaming workloads.
  • Overloading the Orchestrator with non-actionable alerts.

Related implementation resources: AI Use Case for Sales Pipeline Reviews and Deal Risk Scoring and AI Use Case for Content Marketers Using Wordpress To Auto-Translate Blog Posts Into Multiple Languages.

FAQ

What is the purpose of this AGENTS.md Template for Real-Time Analytics Pipelines?

It provides a copyable, project-level operating manual that defines agent roles, handoffs, and governance for streaming workflows.

Who should use this AGENTS.md Template?

Data engineers, platform teams, and engineering leaders building real-time pipelines that rely on AI coding agents and multi-agent orchestration.

How are agent handoffs handled in real-time pipelines?

Handoffs are defined by the Orchestrator with explicit criteria (e.g., completion of ingestion, validity checks, and quality gates) and require the next agent to acknowledge receipt before proceeding.

What governance rules apply to tool access and secrets?

Access is restricted by role, secrets are stored securely, and all usage is auditable. Production actions require approval gates.

How do you handle failure and rollback in streaming workflows?

Failures trigger retries with backoff, followed by a rollback to a known-good snapshot and reprocessing from the last checkpoint.