AGENTS.md TemplatesAGENTS.md Template

AGENTS.md Template for Data Lake Architecture

AGENTS.md Template for Data Lake Architecture to govern AI coding agents and multi-agent orchestration across ingestion, schema, catalog, quality, and governance.

AGENTS.md Templatedata lakedata lake architecturemulti-agent orchestrationagent handoffstool governancedata qualitydata catalogsecuritydeployment

Target User

Developers, data engineers, platform teams, AI/ML engineers, engineering leaders

Use Cases

  • defining a data lake ingestion, schema, catalog, and quality workflow
  • coordinating multi-agent orchestration across data pipeline stages
  • governance, security, and versioned changes to data lake artifacts
  • establishing project-level operating context for single-agent and multi-agent work

Markdown Template

AGENTS.md Template for Data Lake Architecture

# AGENTS.md

Project Role
- Data Lake Platform Engineer Lead: owns the data lake architecture, orchestrator design, and policy enforcement.

Agent roster and responsibilities
- IngestAgent: ingests raw data into landing zone, validates source schema, preserves lineage.
- SchemaAgent: defines and enforces data models, schemas, and schema evolution plans.
- CatalogAgent: registers datasets in the data catalog, maintains metadata, and lineage.
- QualityAgent: runs data quality checks, validates schemas, and flags anomalies.
- OrchestratorAgent: coordinates steps, memory/context sharing, and handoffs; acts as supervisor.
- SecurityAgent: enforces access controls, encryption, and policy compliance.
- ReviewerAgent: performs governance reviews and approves releases.
- DeploymentAgent: manages production deployment, versioning, and rollback.

Supervisor or orchestrator behavior
- OrchestratorAgent initializes tasks from ingestion to cataloging, advances handoffs on success, and halts on failures.
- It passes context and memory between agents and enforces source-of-truth rules.
- It logs decisions with justification and records outcomes in a central metadata store.

Handoff rules between agents
- IngestAgent → SchemaAgent: if ingestion validates and schema aligns, pass schema hints and data catalog keys.
- SchemaAgent → CatalogAgent & QualityAgent: register schema, expose metadata, and queue quality checks.
- CatalogAgent → OrchestratorAgent: confirm catalog entry and pending quality checks; resume on pass.
- QualityAgent → OrchestratorAgent: return quality report; if pass, proceed to deployment; if fail, trigger ReviewerAgent.

Context, memory, and source-of-truth rules
- Source of Truth: central data catalog and metadata store.
- Memory: ephemeral per run, stored in orchestrator context; ensure no agent mutates historical data.
- Context propagation: every task includes data lineage, schema version, dataset ID, and run ID.

Tool access and permission rules
- IngestAgent, SchemaAgent, CatalogAgent, QualityAgent, OrchestratorAgent may access:
  - Landing zones and processing buckets
  - Metadata stores and catalogs
  - Transformation tools approved by policy
- Secrets must be retrieved from aSecretsManager via read-only operations; never hard-code credentials.
- Production systems require review and approvals before any write or config change.

Architecture rules
- Use modular, idempotent steps; each agent should be able to re-run safely.
- Data in landing zone must be immutable once committed; use versioned datasets.
- Prefer event-driven triggers; log every state transition with a timestamp and run ID.

File structure rules
- data-lake/
  - ingest/
  - schemas/
  - catalog/
  - quality/
  - orchestrator/
  - policies/
  - deployments/
  - tests/

Data, API, or integration rules
- All data moves must include lineage metadata.
- Use catalog-driven APIs for schema and governance operations.
- Document interface contracts for each agent in the code repository.

Validation rules
- IngestAgent: validate source presence, schema compatibility, and data freshness.
- SchemaAgent: validate schema compatibility with catalog and downstream consumers.
- QualityAgent: require at least 95% pass rate for quality checks and no critical anomalies.
- OrchestratorAgent: enforce end-to-end reachability of dependent steps before promotion.

Security rules
- Secrets must be stored in a secret manager; no plaintext secrets in code.
- Access to production data requires role-based access control and approval trails.
- Data masking for PII/PHI in non-production environments.

Testing rules
- Unit tests for ingestion and transformation logic.
- Integration tests for end-to-end data flow and schema validation.
- End-to-end tests that simulate real data with expected outcomes.

Deployment rules
- All changes go through a CI/CD pipeline with automatic tests.
- Rollback strategy to previous dataset version and catalog state.
- Deployment must be auditable with run IDs and approvals when required.

Human review and escalation rules
- If any data quality rule fails or critical anomaly is detected, escalate to ReviewerAgent.
- Human review required for schema evolution that may affect downstream consumers.
- All escalations logged with context and resolution.

Failure handling and rollback rules
- On failure, re-run from the last known-good checkpoint; if not available, halt and notify.
- Rollback data and metadata to previous version; re-run dependent steps as needed.
- Ensure idempotent retries to avoid duplicated work.

Things Agents must not do
- Do not bypass approvals or governance gates.
- Do not modify production data directly; use controlled pipelines.
- Do not share secrets in logs or messages.
- Do not drift from the defined schema or data contracts.
- Do not perform unsupervised production changes without authorization.

Overview

Direct answer: This AGENTS.md Template provides a complete operating manual for a data lake architecture workflow, enabling both single-agent execution and multi-agent orchestration with explicit handoffs, governance, and escalation rules. It codifies roles, memory, sources of truth, and tool governance to ensure auditable, repeatable data pipelines.

The template is designed for AI coding agents operating in a data platform context. It covers ingestion, schema design, cataloging, quality checks, security, deployment, and monitoring, with clear boundaries and escalation paths. It also describes how to coordinate agents across the data lifecycle—continuously evolving datasets while protecting data quality and compliance.

When to Use This AGENTS.md Template

  • When building a data lake workflow that includes ingestion, schema design, cataloging, and quality checks with clear handoffs.
  • When you need multi-agent orchestration with explicit planner, implementer, reviewer, and auditor roles.
  • When you require tool governance, security controls, and auditable changes to data lake artifacts.
  • When establishing a project-level operating context for data platform teams and AI coding agents.

Copyable AGENTS.md Template

# AGENTS.md

Project Role
- Data Lake Platform Engineer Lead: owns the data lake architecture, orchestrator design, and policy enforcement.

Agent roster and responsibilities
- IngestAgent: ingests raw data into landing zone, validates source schema, preserves lineage.
- SchemaAgent: defines and enforces data models, schemas, and schema evolution plans.
- CatalogAgent: registers datasets in the data catalog, maintains metadata, and lineage.
- QualityAgent: runs data quality checks, validates schemas, and flags anomalies.
- OrchestratorAgent: coordinates steps, memory/context sharing, and handoffs; acts as supervisor.
- SecurityAgent: enforces access controls, encryption, and policy compliance.
- ReviewerAgent: performs governance reviews and approves releases.
- DeploymentAgent: manages production deployment, versioning, and rollback.

Supervisor or orchestrator behavior
- OrchestratorAgent initializes tasks from ingestion to cataloging, advances handoffs on success, and halts on failures.
- It passes context and memory between agents and enforces source-of-truth rules.
- It logs decisions with justification and records outcomes in a central metadata store.

Handoff rules between agents
- IngestAgent → SchemaAgent: if ingestion validates and schema aligns, pass schema hints and data catalog keys.
- SchemaAgent → CatalogAgent & QualityAgent: register schema, expose metadata, and queue quality checks.
- CatalogAgent → OrchestratorAgent: confirm catalog entry and pending quality checks; resume on pass.
- QualityAgent → OrchestratorAgent: return quality report; if pass, proceed to deployment; if fail, trigger ReviewerAgent.

Context, memory, and source-of-truth rules
- Source of Truth: central data catalog and metadata store.
- Memory: ephemeral per run, stored in orchestrator context; ensure no agent mutates historical data.
- Context propagation: every task includes data lineage, schema version, dataset ID, and run ID.

Tool access and permission rules
- IngestAgent, SchemaAgent, CatalogAgent, QualityAgent, OrchestratorAgent may access:
  - Landing zones and processing buckets
  - Metadata stores and catalogs
  - Transformation tools approved by policy
- Secrets must be retrieved from aSecretsManager via read-only operations; never hard-code credentials.
- Production systems require review and approvals before any write or config change.

Architecture rules
- Use modular, idempotent steps; each agent should be able to re-run safely.
- Data in landing zone must be immutable once committed; use versioned datasets.
- Prefer event-driven triggers; log every state transition with a timestamp and run ID.

File structure rules
- data-lake/
  - ingest/
  - schemas/
  - catalog/
  - quality/
  - orchestrator/
  - policies/
  - deployments/
  - tests/

Data, API, or integration rules
- All data moves must include lineage metadata.
- Use catalog-driven APIs for schema and governance operations.
- Document interface contracts for each agent in the code repository.

Validation rules
- IngestAgent: validate source presence, schema compatibility, and data freshness.
- SchemaAgent: validate schema compatibility with catalog and downstream consumers.
- QualityAgent: require at least 95% pass rate for quality checks and no critical anomalies.
- OrchestratorAgent: enforce end-to-end reachability of dependent steps before promotion.

Security rules
- Secrets must be stored in a secret manager; no plaintext secrets in code.
- Access to production data requires role-based access control and approval trails.
- Data masking for PII/PHI in non-production environments.

Testing rules
- Unit tests for ingestion and transformation logic.
- Integration tests for end-to-end data flow and schema validation.
- End-to-end tests that simulate real data with expected outcomes.

Deployment rules
- All changes go through a CI/CD pipeline with automatic tests.
- Rollback strategy to previous dataset version and catalog state.
- Deployment must be auditable with run IDs and approvals when required.

Human review and escalation rules
- If any data quality rule fails or critical anomaly is detected, escalate to ReviewerAgent.
- Human review required for schema evolution that may affect downstream consumers.
- All escalations logged with context and resolution.

Failure handling and rollback rules
- On failure, re-run from the last known-good checkpoint; if not available, halt and notify.
- Rollback data and metadata to previous version; re-run dependent steps as needed.
- Ensure idempotent retries to avoid duplicated work.

Things Agents must not do
- Do not bypass approvals or governance gates.
- Do not modify production data directly; use controlled pipelines.
- Do not share secrets in logs or messages.
- Do not drift from the defined schema or data contracts.
- Do not perform unsupervised production changes without authorization.

Recommended Agent Operating Model

The model defines clear roles, decision boundaries, and escalation paths among IngestAgent, SchemaAgent, CatalogAgent, QualityAgent, and OrchestratorAgent. The Orchestrator acts as the planner and field general, ensuring multi-agent coordination, handoffs, and governance. Domain specialists and researchers may join for schema design or data quality anomaly investigations as needed.

Recommended Project Structure

data-lake-architecture/
├── ingest/
│   ├── sources/
│   ├── jobs/
│   └── tests/
├── schemas/
├── catalog/
├── quality/
├── orchestrator/
├── policies/
├── deployments/
└── tests/

Core Operating Principles

  • Single source of truth for data catalog and metadata.
  • Idempotent, versioned steps with auditable run histories.
  • Explicit, documented agent handoffs and escalation paths.
  • Rule-based governance for security and access control.
  • Traceable changes to schemas, datasets, and quality rules.

Agent Handoff and Collaboration Rules

  • Planner/Orchestrator defines run plan, dependencies, and handoff triggers.
  • Implementers (IngestAgent, SchemaAgent, CatalogAgent, QualityAgent) execute tasks and produce artifacts with provenance.
  • Reviewers validate governance-critical changes (schema evolution, policy changes) before promotion.
  • testers simulate production-like scenarios and validate end-to-end data flow.
  • Researchers/Domain Specialists provide domain-focused validations when required.

Tool Governance and Permission Rules

  • Only approved tools may access landing zones, catalogs, and metadata stores.
  • Secrets are retrieved from a secret manager; logs must not contain secrets.
  • Production write operations require explicit approvals and change tickets.
  • API calls to data services are governed by access policies and audit trails.

Code Construction Rules

  • Write modular code with clear interfaces between agents.
  • All transformations are idempotent and retry-safe.
  • Code and data contracts are versioned; schema changes are backward-compatible when possible.
  • Include unit and integration tests; include end-to-end validation tests for data flow.
  • Document interfaces, expected inputs/outputs, and failure modes in code comments and docs.

Security and Production Rules

  • Protect PII/PHI with masking and encryption; limit exposure in non-production environments.
  • Enforce least-privilege access for all agents and services.
  • Audit all data movements and schema changes; store logs securely.
  • Follow deployment gating for production changes with rollback strategies.

Testing Checklist

  • Unit tests for ingestion and schema mapping.
  • Integration tests for end-to-end data flow and lineage.
  • Data quality checks pass in staging before promotion.
  • Security and access control tests in sandbox environments.
  • Rollout tests and rollback verification in a controlled preview.

Common Mistakes to Avoid

  • Ambiguous ownership and unclear decision boundaries.
  • Un-versioned artifacts and undocumented schema changes.
  • Bypassing governance gates or secretly changing production pipelines.
  • Context drift: failing to propagate run context or lineage properly.
  • Architectural drift: adding ad-hoc tools outside the defined workflow.

Related implementation resources: AI Use Case for Corporate Event Managers Using Slack To Orchestrate Day-Of Venue Tasks Across Multi-Department Teams and AI Agent Use Case for Wholesalers Using Multi-Currency Ledger Trackers To Calculate Foreign Exchange Risk Exposure Across Global Accounts.

FAQ

What is the purpose of this AGENTS.md Template for Data Lake Architecture?

This AGENTS.md Template codifies roles, handoffs, and governance for data lake workflows, enabling reliable single-agent or multi-agent orchestration across ingestion, schema, catalog, quality, and security.

Who should use this AGENTS.md Template?

Data platform engineers, data engineers, AI engineers, and engineering leaders implementing data lake architectures with multi-agent coordination.

How do agent handoffs work in this template?

Orchestrator initiates tasks, passes context between agents, and promotes handoffs when success criteria are met; failures escalate to reviewers and trigger corrective actions.

What should I adjust before using this template?

Tailor roles to your team, update data sources, define your data contracts, and align with your organization's security and governance policies.

How can I extend this template for different data domains?

Clone the scaffold, define domain-specific schemas, quality rules, and catalog entries; ensure domain specialists participate in reviews for new domains.