AGENTS.md TemplatesAGENTS.md Template

AGENTS.md Template for Data Pipeline and Analytics Agents

AGENTS.md Template for Data Pipeline and Analytics Agents: a copyable operating manual for single-agent and multi-agent data workflows, including roles, handoffs, governance, and testing.

AGENTS.md Templatedata pipelineanalyticsmulti-agent orchestrationagent handoff rulestool governancehuman reviewdata governanceworkflow templateAI coding agentsanalytics pipeline

Target User

Founders, CTOs, VP of Engineering, Data Platform Engineers, AI/ML Engineers

Use Cases

  • Define a repeatable operating context for data pipelines
  • Coordinate single-agent and multi-agent data workflows
  • Govern data ingestion, processing, analytics, and delivery
  • Enable clear handoffs and governance between agents
  • Support tooling governance, security, and auditability

Markdown Template

AGENTS.md Template for Data Pipeline and Analytics Agents

# AGENTS.md

Project Role
- Purpose: Govern a data pipeline and analytics workflow using AI coding agents with multi-agent orchestration.
- Scope: Ingest, cleanse, transform, feature/analytic generation, and delivery to analytics platforms or data stores; centralized supervision coordinates tasks, memory, and outputs.

Agent roster and responsibilities
- IngestAgent: acquire data from sources, enforce ingestion schema, validate data quality, and emit ingestion artifacts.
- CleansingAgent: deduplicate, normalize, and fill missing values; log data quality issues.
- TransformAgent: apply joins, aggregations, feature engineering, and schema validation; emit transformed artifacts.
- AnalyticsAgent: execute analytics, generate insights, dashboards, and reports; surface explainability context.
- StorageAgent: write data to warehouse or data lake with lineage metadata.
- OrchestratorAgent (Supervisor): maintains plan, schedules steps, tracks progress, handles retries, and coordinates handoffs.

Supervisor or orchestrator behavior
- Maintains a living plan with step ordering, required artifacts, and success/failure criteria.
- Triggers retries on transient failures, but halts on policy violations or data quality threshold breaches.
- Communicates context to downstream agents via memory and source-of-truth artifacts.

Handoff rules between agents
- Upstream agents must emit artifacts with versioned IDs and a memory snapshot.
- Downstream agents validate inputs against expected schema and pass along updated memory context.
- Handoffs are triggered by the Orchestrator, not by ad-hoc agent actions.

Context, memory, and source-of-truth rules
- All artifacts and memory are versioned in a central memory store.
- Source-of-truth is the authoritative data store (data warehouse/feature store).
- Agents reference a single shared memory namespace to avoid drift.

Tool access and permission rules
- Agents operate with least privilege; read sources, write to approved targets only.
- Secrets are retrieved from a centralized vault with rotation and access logs.
- API keys and tokens expire and are audited.

Architecture rules
- Idempotent steps; re-running is safe.
- Stateless agents with external state stored in memory or centralized stores.
- Clear boundaries between ingestion, cleansing, transformation, and analytics components.

File structure rules
- Each agent type has its own module under agents/ (ingest/, cleanse/, transform/, analytics/).
- Orchestrator resides in orchestrator/ with a plan file and state machine definitions.
- Configs/ contains environment-specific settings and feature flags.
- Tests/ contains unit and integration tests for data flows.

Data, API, or integration rules when relevant
- Use Parquet/JSON/CSV as data formats; validate with schema definitions.
- REST/GraphQL endpoints must be versioned; requests logged with trace IDs.
- Streaming inputs must be batched for idempotent processing where possible.

Validation rules
- Data quality thresholds must be checked after each step (schema, null rates, range checks).
- Outputs must match target schemas and metadata definitions.
- End-to-end tests compare final outputs to ground truth datasets.

Security rules
- Secrets never stored in code; use vault with role-based access control.
- Production data handling follows least privilege and data masking requirements.
- Audit trails enabled for all data movement and transformations.

Testing rules
- Unit tests for each agent function; integration tests for end-to-end runs.
- Mock external systems where possible; use sandboxed data where feasible.
- CI gates that block deployments if tests fail.

Deployment rules
- Deploy in small, reversible steps; perform canary validations.
- Rollback to known good artifacts on failure.
- Instrumentation to observe data lineage and performance.

Human review and escalation rules
- Data quality or compliance issues escalate to data governance.
- Anomalies trigger manual review and approval before proceeding.

Failure handling and rollback rules
- Failures are logged, retried up to a limit, then escalated.
- Rollback reverts to previous artifact versions and notifies the orchestrator.

Things Agents must not do
- Do not modify governance policies without approval.
- Do not bypass validation or skip memory updates.
- Do not access production controls outside defined roles.

Overview

This AGENTS.md template defines a data pipeline and analytics agent workflow, governing how AI coding agents operate individually and in multi-agent orchestration. It provides a precise operating context for data ingestion, cleansing, transformation, and analytics, with explicit handoffs, source-of-truth, and governance rules to prevent context drift and architecture drift.

Direct answer: Use this template to establish roles, plan-driven task coordination, and a repeatable runbook for end-to-end data processing and analytics using AI agents.

When to Use This AGENTS.md Template

  • When implementing a data pipeline that involves AI-driven data ingestion, cleaning, transformation, and analytics.
  • When you need clear agent handoffs and a central orchestrator to coordinate multi-agent workflows.
  • When governance, security, and auditability are required for production data flows.
  • When you want a copyable operating manual that teams can adopt across projects.

Copyable AGENTS.md Template

# AGENTS.md

Project Role
- Purpose: Govern a data pipeline and analytics workflow using AI coding agents with multi-agent orchestration.
- Scope: Ingest, cleanse, transform, feature/analytic generation, and delivery to analytics platforms or data stores; centralized supervision coordinates tasks, memory, and outputs.

Agent roster and responsibilities
- IngestAgent: acquire data from sources, enforce ingestion schema, validate data quality, and emit ingestion artifacts.
- CleansingAgent: deduplicate, normalize, and fill missing values; log data quality issues.
- TransformAgent: apply joins, aggregations, feature engineering, and schema validation; emit transformed artifacts.
- AnalyticsAgent: execute analytics, generate insights, dashboards, and reports; surface explainability context.
- StorageAgent: write data to warehouse or data lake with lineage metadata.
- OrchestratorAgent (Supervisor): maintains plan, schedules steps, tracks progress, handles retries, and coordinates handoffs.

Supervisor or orchestrator behavior
- Maintains a living plan with step ordering, required artifacts, and success/failure criteria.
- Triggers retries on transient failures, but halts on policy violations or data quality threshold breaches.
- Communicates context to downstream agents via memory and source-of-truth artifacts.

Handoff rules between agents
- Upstream agents must emit artifacts with versioned IDs and a memory snapshot.
- Downstream agents validate inputs against expected schema and pass along updated memory context.
- Handoffs are triggered by the Orchestrator, not by ad-hoc agent actions.

Context, memory, and source-of-truth rules
- All artifacts and memory are versioned in a central memory store.
- Source-of-truth is the authoritative data store (data warehouse/feature store).
- Agents reference a single shared memory namespace to avoid drift.

Tool access and permission rules
- Agents operate with least privilege; read sources, write to approved targets only.
- Secrets are retrieved from a centralized vault with rotation and access logs.
- API keys and tokens expire and are audited.

Architecture rules
- Idempotent steps; re-running is safe.
- Stateless agents with external state stored in memory or centralized stores.
- Clear boundaries between ingestion, cleansing, transformation, and analytics components.

File structure rules
- Each agent type has its own module under agents/ (ingest/, cleanse/, transform/, analytics/).
- Orchestrator resides in orchestrator/ with a plan file and state machine definitions.
- Configs/ contains environment-specific settings and feature flags.
- Tests/ contains unit and integration tests for data flows.

Data, API, or integration rules when relevant
- Use Parquet/JSON/CSV as data formats; validate with schema definitions.
- REST/GraphQL endpoints must be versioned; requests logged with trace IDs.
- Streaming inputs must be batched for idempotent processing where possible.

Validation rules
- Data quality thresholds must be checked after each step (schema, null rates, range checks).
- Outputs must match target schemas and metadata definitions.
- End-to-end tests compare final outputs to ground truth datasets.

Security rules
- Secrets never stored in code; use vault with role-based access control.
- Production data handling follows least privilege and data masking requirements.
- Audit trails enabled for all data movement and transformations.

Testing rules
- Unit tests for each agent function; integration tests for end-to-end runs.
- Mock external systems where possible; use sandboxed data where feasible.
- CI gates that block deployments if tests fail.

Deployment rules
- Deploy in small, reversible steps; perform canary validations.
- Rollback to known good artifacts on failure.
- Instrumentation to observe data lineage and performance.

Human review and escalation rules
- Data quality or compliance issues escalate to data governance.
- Anomalies trigger manual review and approval before proceeding.

Failure handling and rollback rules
- Failures are logged, retried up to a limit, then escalated.
- Rollback reverts to previous artifact versions and notifies the orchestrator.

Things Agents must not do
- Do not modify governance policies without approval.
- Do not bypass validation or skip memory updates.
- Do not access production controls outside defined roles.

Recommended Agent Operating Model

Roles and responsibilities are aligned to data pipeline stages: Ingest, Cleanse, Transform, Analytics, and Orchestrator. Decision boundaries are explicit: Agents own outputs they generate and cannot override upstream artifacts without orchestration. Escalation paths route to the Orchestrator first, then to data governance if needed.

Recommended Project Structure

data-pipeline/
  agents/
    ingest/
    cleanse/
    transform/
    analytics/
  orchestrator/
  memory/
  configs/
  pipelines/
  tests/
  docs/
  README.md

Core Operating Principles

  • Single source of truth; all outputs reference canonical artifacts.
  • Idempotent and deterministic steps; reruns safe.
  • Least privilege and auditable actions; secrets protected.
  • Clear ownership and handoffs between agents.
  • Comprehensive logging and explainability for analytics outputs.

Agent Handoff and Collaboration Rules

  • Planner/orchestrator defines the sequence and passes a plan to each agent.
  • Implementers validate input artifacts, then produce outputs with versioned IDs.
  • Reviewers assess data quality and compliance before downstream use.
  • Testers ensure end-to-end correctness prior to production deployment.
  • Researchers may propose new data sources but require governance approval.
  • Domain specialists review data sensitivity and regulatory constraints.

Tool Governance and Permission Rules

  • Commands executed by agents must be auditable with trace IDs.
  • File edits are version-controlled; production writes are gated by approvals.
  • API calls use token-based authentication; secrets rotate periodically.
  • Production systems require approval gates and rollback pathways.
  • External services accessed only through defined connectors with monitoring.

Code Construction Rules

  • All transformations are pure functions where possible.
  • Inputs and outputs are strictly typed; schemas enforced at each step.
  • Configuration-driven behavior; no hard-coded values in agent code.
  • Logging includes plan context, artifact IDs, and memory state.

Security and Production Rules

  • Data masking for PII/PHI; sensitive fields redacted in logs.
  • Audit logs retained for a defined period; tampering alerts enabled.
  • Secret management with access control and rotation policies.

Testing Checklist

  • Unit tests for each agent function with deterministic inputs.
  • Integration tests for ingestion to analytics output.
  • End-to-end tests in a staging environment with realistic data.
  • Performance checks and capacity testing for data volumes.
  • Security tests including secret leakage and access control checks.

Common Mistakes to Avoid

  • Skipping validation and assuming downstream fixes will correct upstream issues.
  • Overlapping responsibilities leading to duplicated work across agents.
  • Uncontrolled memory growth; failing to prune or version artifacts.
  • Ignoring data lineage, causing trust and governance problems.

FAQ

What is this AGENTS.md Template for Data Pipeline and Analytics?

This AGENTS.md Template defines the operating model, agent roster, and governance for data pipeline and analytics workflows, enabling single-agent and multi-agent orchestration with clear handoffs.

How does multi-agent orchestration work in this template?

Agents have defined roles with explicit handoffs and ownership. A planner, ingest/cleanse/transform/analytics agents, and a supervisor coordinate tasks and maintain a single source of truth.

What are the handoff rules between agents?

Handoffs follow the orchestrator plan: downstream agent waits for upstream artifacts, validates outputs, and passes context via shared memory and versioned artifacts.

How is security and access controlled in this workflow?

Permissions are least-privilege, secrets are vault-managed, and all actions are logged with audit trails and token scope controls.

How do you test and validate the data pipeline agents?

Unit tests validate individual steps; integration tests verify the end-to-end pipeline; automated checks compare outputs to schemas and quality metrics.