AGENTS.md Template: Data Warehouse System Design

Overview

AGENTS.md Template for Data Warehouse System Design defines a project-level operating context for AI coding agents that design and govern data warehouses. It supports single-agent work and multi-agent orchestration with explicit handoffs, memory, and source-of-truth management.

Direct answer: This AGENTS.md Template codifies roles, flows, and governance to ensure safe, auditable, and effective data warehouse design with AI agents.

When to Use This AGENTS.md Template

Starting a new data warehouse design initiative with AI-assisted automation
Coordinating multiple agents (planner, implementer, reviewer, and domain specialists) in a single workflow
Enforcing governance, data lineage, and security requirements from project inception

Copyable AGENTS.md Template

# AGENTS.md
Project role: Data Warehouse Platform Engineer
Agent roster and responsibilities:
- Planner: designs overall data warehouse architecture and coordinates multi-agent workflow.
- Implementer: builds ETL/ELT pipelines and loads data into the warehouse.
- Reviewer: validates schema, data quality, and governance compliance.
- Data Architect: defines star/snowflake schema, data models, metadata, lineage.
- Domain Specialist: provides BI and analytics domain rules, business vocabularies.
Supervisor or orchestrator behavior: The orchestrator coordinates agents, assigns tasks, and enforces memory, source-of-truth, and escalation rules.
Handoff rules between agents:
- Planner -> Implementer: pass data model, data sources, and validation criteria.
- Implementer -> Reviewer: pass transformed data, quality checks, and lineage references.
- Reviewer -> Planner: pass issues and remediation plan; escalate if not resolved.
Context, memory, and source-of-truth rules:
- Memory is persisted per run with a canonical memory store; sources of truth are registered in the data catalog.
- All decisions must reference source-of-truth; no untracked data movement.
Tool access and permission rules:
- Access tokens are rotated; secrets stored in a vault; agents have least privilege.
- Implementer is allowed to run SQL, Python, and orchestration commands via approved adapters.
Architecture rules:
- Enforce a schema-on-write approach with lineage tracking; use a modular warehouse design (facts, dimensions, and staging areas).
- Use idempotent ETL steps and idempotent API calls.
File structure rules:
- Organized under data-warehouse-design/ with separate folders for models, pipelines, and governance.
- Do not create duplicate files; use versioned artifacts.
Data, API, or integration rules when relevant:
- All data ingress must go through the ingestion layer; metadata is captured in the data catalog.
- Use standard REST or gRPC endpoints for integrations; respect rate limits.
Validation rules:
- Include unit tests for transformations; data quality checks; schema validation; end-to-end integration tests.
Security rules:
- Encrypt data at rest and in transit; rotate credentials; enforce least privilege.
Testing rules:
- Run tests in CI; perform regression checks; validate schema compatibility.
Deployment rules:
- Deploy through controlled pipelines; require reviewer approval for production changes.
Human review and escalation rules:
- Any data quality threshold breach triggers escalation to Domain Specialist and Architect.
Failure handling and rollback rules:
- If a step fails, revert to last known good state; log changes; notify stakeholders.
Things Agents must not do:
- Do not bypass governance; do not access production secrets without authorization; do not deploy without validation.

Recommended Agent Operating Model

The recommended model defines clear roles, decision boundaries, and escalation paths for AI coding agents in data warehouse design. Planners decide architecture and handoffs; Implementers execute transformations; Reviewers validate results; Data Architects define schemas and lineage; Domain Specialists provide business context. Escalation paths move from routine validation to Domain Specialist or Architect review when quality or governance risk is detected.

Recommended Project Structure

data-warehouse-design/
├── agents/
│   ├── planner/
│   │   └── planner.py
│   ├── implementer/
│   │   └── etl.py
│   ├── reviewer/
│   │   └── reviewer.py
│   │── data-architect/
│   │   └── schema.sql
│   └── domain-specialist/
│       └── glossary.md
├── orchestrator/
│   └── orchestrator.py
├── models/
│   └── dw_schema.sql
├── pipelines/
│   └── ingest_to_dw/
├── tests/
│   └── qa/
├── governance/
│   └── policies.md
└── docs/
    └── readme.md

Core Operating Principles

Single source of truth for data definitions, lineage, and decisions.
Deterministic outputs; idempotent actions and repeatable runs.
Memory and context are persisted with clear source-of-truth references.
Tooling and access are governed by least privilege and auditable logs.
Human review is required for high-risk changes; escalation rules are explicit.

Agent Handoff and Collaboration Rules

Planner provides architecture decision records and passes context to Implementer.
Implementer completes ETL, then hands off data and lineage to Reviewer.
Reviewer validates data quality and governance compliance; flags issues to Planner and Domain Specialist if needed.
Data Architect ensures schema and metadata consistency; Domain Specialist aligns business rules.
Domain Specialist resolves business logic gaps; orchestrator manages cross-agent dependencies.

Tool Governance and Permission Rules

All tool actions are auditable; secrets are stored securely; production tools require approval gates.
SQL execution, data loads, and API calls must pass validation checks before execution.
Prevent direct edits to production datasets; changes must go through controlled pipelines.

Code Construction Rules

Write readable SQL with explicit aliases and comments; avoid non-deterministic functions in production paths.
Modularize ETL steps; keep transformations small and testable.
Scripts must be idempotent and idempotent at retry boundaries.
Follow naming conventions for artifacts and folders; version artifacts with semantic versions.

Security and Production Rules

Encrypt data at rest and in transit; rotate secrets; enforce least privilege and access audits.
Production changes require reviewer approvals and feature flags; maintain rollback plans.
Monitor for anomalous data loads and access patterns; alert on policy violations.

Testing Checklist

Unit tests for each ETL step and transformation.
Integration tests for data loads and lineage accuracy.
Data quality checks with threshold-based alerts.
End-to-end tests of orchestrated workflows in a staging environment.
Security and access control tests for secrets and production endpoints.

Common Mistakes to Avoid

Skipping data lineage and governance in early design.
Allowing ad-hoc changes without approvals or rollback plans.
Overfitting templates to a single technology; preserve portability and abstraction.
Dropping memory or source-of-truth references during handoffs.

FAQ

What is this AGENTS.md Template for Data Warehouse System Design?

It defines a reproducible operating manual for AI coding agents to design, govern, and orchestrate a data warehouse with clear handoffs and governance.

Who should use this template?

Data platform engineers, data architects, BI teams, and product teams deploying AI coding agents for warehouse design and governance.

How does multi-agent orchestration handle handoffs?

The planner assigns tasks, passes context to the implementer, who passes results to the reviewer; the orchestrator enforces memory and source-of-truth across steps and escalates when needed.

What are the key security and governance rules?

Least privilege access, secrets in a vault, encryption in transit and at rest, and auditable actions with approvals for production changes.

How is testing and deployment handled?

Unit, integration, and data quality tests run in CI; deployments go through controlled pipelines with reviewer approval and rollback plans.

Target User

Use Cases