AGENTS.md TemplatesAGENTS.md Template

ELT Pipeline Architecture AGENTS.md Template (AGENTS.md template)

AGENTS.md Template for ELT pipelines: define multi-agent orchestration, handoffs, and governance for end-to-end data pipelines.

AGENTS.md TemplateELT pipelineAI coding agentsmulti-agent orchestrationagent handoffstool governancedata pipelinedata engineeringworkflow orchestrationhuman review

Target User

Data engineers, data platform teams, and engineering leaders

Use Cases

  • Single-agent ELT automation
  • Multi-agent ELT orchestration
  • Tool governance and security for ELT pipelines
  • Hybrid human-in-the-loop data workflows

Markdown Template

ELT Pipeline Architecture AGENTS.md Template (AGENTS.md template)

# AGENTS.md
# ELT Pipeline Orchestration AGENTS.md
# This file defines the operating model for a team using AI coding agents to manage
# an ELT pipeline (Extract, Load, Transform) with multi-agent orchestration.

Project Role: Data Platform Engineer leading the ELT automation effort.

Agent roster and responsibilities:
- Planner: defines the end-to-end ELT task plan, coordinates preconditions, and schedules agents.
- Extractor: performs data extraction from sources, incremental loading, and source-change detection.
- Loader: loads extracted data into the target data lake/warehouse with idempotent writes.
- Transformer: applies schema, transformations, normalization, and enrichment.
- Validator: validates data quality, lineage, and schema conformance.
- Orchestrator: coordinates steps, handles retries, timeouts, and handoffs.
- Reviewer: validates outputs against acceptance criteria and business rules.
- Auditor: records audit trails, metrics, and lineage.

Supervisor or orchestrator behavior:
- The Orchestrator maintains global context, ensures preconditions, and dispatches tasks to agents.
- It enforces timeouts, retries, and escalation rules when agents fail or drift.

Handoff rules between agents:
- Planner -> Extractor: ensure source availability and preconditions are met.
- Extractor -> Loader: data extracted and validated for load readiness.
- Loader -> Transformer: loaded data is ready for transformation.
- Transformer -> Validator: transformed data is validated.
- Validator -> Orchestrator: acceptance criteria met; pass to finalization.
- Orchestrator -> Reviewer: request final QA; if approval granted, proceed to deployment.

Context, memory, and source-of-truth rules:
- All context is stored in a central memory store keyed by pipeline id.
- The canonical data sources are the source-of-truth; any derived results must be traceable back to the sources.
- All agent outputs are versioned and stored in a data catalog.

Tool access and permission rules:
- Agents may read data sources and write to the data lake/warehouse with role-based permissions.
- Secrets must be retrieved from a secret manager; do not store secrets in code.
- Production systems require explicit approval gates.

Architecture rules:
- Stateless, idempotent steps; event-driven triggers; deterministic results.
- Clear separation between planner, executors, and validators.
- Proper logging and traceable lineage.

File structure rules:
- Do not create unnecessary folders.
- Maintain a single source of truth in the data catalog and a separate memory store for context.

Data, API, or integration rules when relevant:
- Use standard data formats (Parquet, JSON, Avro) and consistent schemas.
- API calls must be idempotent where possible; handle retries gracefully.

Validation rules:
- All data outputs must pass schema, integrity, and lineage checks before finalization.

Security rules:
- Secrets stored in vault; access tightly controlled; rotate regularly.

Testing rules:
- Unit tests for each agent; integration tests across planner-extractor-loader-transformer-validator chain.
- End-to-end tests with synthetic data in staging environment.
- Security and access control tests; secrets rotation tests.

Deployment rules:
- CI/CD gating; canary deploy into staging; monitor; promote to production with approval.

Human review and escalation rules:
- Any anomaly triggers human review by data engineer or data governance.

Failure handling and rollback rules:
- On failure, roll back to last good state; preserve provenance; re-run from last valid step.

Things Agents must not do:
- Do not bypass approvals; Do not operate in production without runbook; Do not modify upstream data outside allowed sources; Do not leak secrets; Do not drift from the canonical data models.

Overview

The AGENTS.md template for ELT pipeline architecture provides a formal operating manual that governs the behavior of AI coding agents in single-agent and multi-agent ELT workflows. It defines roles, coordination patterns, memory, sources of truth, and governance to ensure reproducible, auditable data pipelines.

Direct answer: This template establishes the project-level operating context for orchestrating ELT tasks through defined agents, ensures data lineage, and sets explicit handoffs and guardrails for safe multi-agent collaboration.

When to Use This AGENTS.md Template

  • When building or evolving an ELT pipeline that uses AI coding agents to extract, load, transform, and validate data.
  • When you need a deterministic, auditable, and governable multi-agent orchestration pattern.
  • When you require explicit handoffs, memory, and source-of-truth management across agents.
  • When you must enforce tool governance and security for production data.

Copyable AGENTS.md Template

Use the block below as the project-level AGENTS.md for your ELT pipeline orchestration.

# AGENTS.md
# ELT Pipeline Orchestration AGENTS.md
# This file defines the operating model for a team using AI coding agents to manage
# an ELT pipeline (Extract, Load, Transform) with multi-agent orchestration.

Project Role: Data Platform Engineer leading the ELT automation effort.

Agent roster and responsibilities:
- Planner: defines the end-to-end ELT task plan, coordinates preconditions, and schedules agents.
- Extractor: performs data extraction from sources, incremental loading, and source-change detection.
- Loader: loads extracted data into the target data lake/warehouse with idempotent writes.
- Transformer: applies schema, transformations, normalization, and enrichment.
- Validator: validates data quality, lineage, and schema conformance.
- Orchestrator: coordinates steps, handles retries, timeouts, and handoffs.
- Reviewer: validates outputs against acceptance criteria and business rules.
- Auditor: records audit trails, metrics, and lineage.

Supervisor or orchestrator behavior:
- The Orchestrator maintains global context, ensures preconditions, and dispatches tasks to agents.
- It enforces timeouts, retries, and escalation rules when agents fail or drift.

Handoff rules between agents:
- Planner -> Extractor: ensure source availability and preconditions are met.
- Extractor -> Loader: data extracted and validated for load readiness.
- Loader -> Transformer: loaded data is ready for transformation.
- Transformer -> Validator: transformed data is validated.
- Validator -> Orchestrator: acceptance criteria met; pass to finalization.
- Orchestrator -> Reviewer: request final QA; if approval granted, proceed to deployment.

Context, memory, and source-of-truth rules:
- All context is stored in a central memory store keyed by pipeline id.
- The canonical data sources are the source-of-truth; any derived results must be traceable back to the sources.
- All agent outputs are versioned and stored in a data catalog.

Tool access and permission rules:
- Agents may read data sources and write to the data lake/warehouse with role-based permissions.
- Secrets must be retrieved from a secret manager; do not store secrets in code.
- Production systems require explicit approval gates.

Architecture rules:
- Stateless, idempotent steps; event-driven triggers; deterministic results.
- Clear separation between planner, executors, and validators.
- Proper logging and traceable lineage.

File structure rules:
- Do not create unnecessary folders.
- Maintain a single source of truth in the data catalog and a separate memory store for context.

Data, API, or integration rules when relevant:
- Use standard data formats (Parquet, JSON, Avro) and consistent schemas.
- API calls must be idempotent where possible; handle retries gracefully.

Validation rules:
- All data outputs must pass schema, integrity, and lineage checks before finalization.

Security rules:
- Secrets stored in vault; access tightly controlled; rotate regularly.

Testing rules:
- Unit tests for each agent; integration tests across planner-extractor-loader-transformer-validator chain.
- End-to-end tests with synthetic data in staging environment.
- Security and access control tests; secrets rotation tests.

Deployment rules:
- CI/CD gating; canary deploy into staging; monitor; promote to production with approval.

Human review and escalation rules:
- Any anomaly triggers human review by data engineer or data governance.

Failure handling and rollback rules:
- On failure, roll back to last good state; preserve provenance; re-run from last valid step.

Things Agents must not do:
- Do not bypass approvals; Do not operate in production without runbook; Do not modify upstream data outside allowed sources; Do not leak secrets; Do not drift from the canonical data models.

Recommended Agent Operating Model

Roles and decision boundaries: Planner decides next steps, Extractor performs extraction within source constraints, Loader enforces load rules, Transformer ensures schema consistency, Validator ensures data quality, Orchestrator coordinates and decides when to escalate. Escalation path: if critical failure or missing preconditions, escalate to Reviewer or data governance; memory is updated with decisions and outcomes.

Recommended Project Structure

elt-pipeline/
├── agents/
│   ├── planner/
│   │   └── plan.py
│   ├── extractor/
│   │   └── extractor.py
│   ├── loader/
│   │   └── loader.py
│   ├── transformer/
│   │   └── transformer.py
│   ├── validator/
│   │   └── validator.py
│   ├── orchestrator/
│   │   └── orchestrator.py
│   ├── reviewer/
│   │   └── reviewer.py
│   └── auditor/
│       └── auditor.py
├── configs/
│   └── pipeline.yaml
├── data/
│   ├── raw/
│   ├── curated/
│   └── lineage/
├── scripts/
│   └── run_pipeline.sh
├── tests/
│   ├── unit/
│   └── integration/
└── docs/
    └── README.md

Core Operating Principles

  • Explicit ownership and accountability for each agent role.
  • Idempotent, deterministic steps with clear side-effect handling.
  • Single source of truth and robust data lineage tracking.
  • Strong observability: structured logs, traces, and metrics.
  • Guardrails for tool access, secrets, and production changes.
  • Continuous validation of data quality and schema conformance.

Agent Handoff and Collaboration Rules

  • Planner coordinates the plan and passes constraints to executors; all decisions are auditable.
  • Extractor and Loader operate under pre-defined data source and target schemas with strict read/write permissions.
  • Transformer validates schema and enrichment rules; Validator enforces quality gates.
  • Orchestrator manages timeouts, retries, and escalation to Reviewer when criteria are not met.
  • Reviewer provides final approval or requests additional changes; Auditor records outcomes and lineage.
  • Researchers, domain specialists and additional agents may provide input as needed but cannot bypass the established handoff flow.

Tool Governance and Permission Rules

  • Access to data sources and targets must be role-based; secrets must be retrieved from vault.
  • APIs must use tokens with scoped permissions; avoid long-lived credentials.
  • Changes to production configurations require explicit approvals and rollback plans.
  • All tool actions must be logged and auditable.

Code Construction Rules

  • Write modular, testable code with clear interfaces between agents.
  • All transformations must be deterministic and idempotent.
  • Use versioned schemas and keep data contracts stable.
  • Document assumptions and edge cases in the code comments or docs.

Security and Production Rules

  • Secrets in vault; minimal privileges; rotate credentials regularly.
  • Data encryption at rest and in transit; access logs stored securely.
  • Production changes require runbooks, approvals, and canary deployment.

Testing Checklist

  • Unit tests for each agent; mock external dependencies.
  • Integration tests across planner-extractor-loader-transformer-validator-chain.
  • End-to-end tests with synthetic data in staging environment.
  • Security and access control tests; secrets rotation tests.

Common Mistakes to Avoid

  • Skipping memory persistence and source-of-truth references; losing lineage.
  • Unchecked retries causing duplicate work or data drift.
  • Ignoring governance, approvals, and security in production.
  • Overly coupled agents; not testing end-to-end flows.

Related implementation resources: AI Use Case for Sales Pipeline Reviews and Deal Risk Scoring and AI Use Case for Corporate Event Managers Using Slack To Orchestrate Day-Of Venue Tasks Across Multi-Department Teams.

FAQ

How does this AGENTS.md Template support ELT pipeline orchestration?

This template defines roles, handoffs, memory, source of truth, and governance for both single-agent and multi-agent ELT workflows.

What are the agent handoff rules between ELT stages?

Planner -> Extractor, Extractor -> Loader, Loader -> Transformer, Transformer -> Validator, Validator -> Orchestrator, Orchestrator -> Reviewer; each handoff includes preconditions and validation gates.

How is memory and the source of truth managed?

Context is stored in a central memory store by pipeline; sources are canonical; outputs are versioned and mapped to sources for lineage.

What security and governance constraints exist?

Secrets in vault, role-based access, production gating, audit logging, and strict change management.

How is testing performed for the ELT workflow?

Unit tests per agent, integration tests across agents, and end-to-end tests in staging with synthetic data.