Lakehouse Architecture Design AGENTS.md Template

Overview

This AGENTS.md template defines the operating context for a lakehouse architecture design project managed by AI coding agents. It governs single-agent execution and multi-agent orchestration, including handoffs, tool governance, memory, and human review gates.

Direct answer: Use this AGENTS.md Template to establish roles, responsibilities, handoff rules, and governance for lakehouse design workflows powered by AI agents.

When to Use This AGENTS.md Template

When designing a lakehouse architecture with AI agents that span data lake, data warehouse, and governance layers.
When coordinating multiple agents for ingestion, transformation, cataloging, policy enforcement, and validation.
When you need a project-level operating context that enables reproducible agent behavior across environments.
When enforcing tool governance, access controls, and human review gates before production changes.

Copyable AGENTS.md Template

The following is a ready-to-paste AGENTS.md template block for lakehouse design. Copy this into your project to establish the standard operating context for your agents.

# AGENTS.md
Project Role: Lakehouse Design Lead
Agent roster and responsibilities:
- Planner: Defines architecture targets, milestones, and acceptance criteria.
- Architect: Designs lakehouse components (data lake, data warehouse, governance layer) and ensures schema governance.
- Ingest/ETL Implementer: Builds ingestion pipelines, transformations, and quality guards.
- Quality Auditor: Validates data quality, lineage, and policy compliance.
- Security Auditor: Ensures secrets, access control, and production safeguards.
- Data Steward: Manages metadata, catalog, lineage, and data policies.
- Docs Generator: Produces and maintains runbooks, API docs, and this AGENTS.md.
Supervisor or orchestrator:
- Lakehouse Orchestrator: Coordinates task assignment, context propagation, memory, and escalation.
Handoff rules:
- Planner → Architect: once the architecture plan is approved.
- Architect → Ingest/ETL Implementer: when design is ready for implementation.
- Ingest/ETL Implementer → Quality Auditor: after pipelines pass initial validation.
Context/memory/source-of-truth:
- Source of truth: metadata catalog, versioned data lake, and lineage graph.
- Memory: decisions and learning distilled in a central knowledge store.
Tool access and permission rules:
- Access to storage paths, compute, API endpoints, and orchestration surfaces via secure principals.
- Secrets in a vault; rotate regularly and enforce least privilege.
Architecture rules:
- Prefer a lakehouse pattern (data lake + lakehouse tables) with schema evolution support.
- Enforce partitioning, data quality guards, and lineage.
File structure rules:
- lakehouse-design/
  ├── agents/
  │   ├── planner/
  │   ├── architect/
  │   ├── ingest-implementer/
  │   ├── quality-auditor/
  │   ├── security-auditor/
  │   └── data-steward/
  ├── pipelines/
  ├── catalogs/
  ├── governance/
  ├── tests/
  └── docs/
Data, API, or integration rules when relevant:
- Ingest from source systems via secure connectors; publish metadata to catalog; expose APIs for governance.
Validation rules:
- Data quality checks, schema validation, and governance policy conformance.
Security rules:
- Encrypt data at rest/in transit; enforce least-privilege access; audit trails.
Testing rules:
- Include unit, integration, and end-to-end tests for pipelines and governance.
Deployment rules:
- CI/CD with production approvals and rollback mechanisms.
Human review/escalation rules:
- Escalate policy gaps to Data Governance Council; require human sign-off for production changes.
Failure handling/rollback rules:
- Revert to last good checkpoint; automated alerts; retries with backoff.
Things Agents must not do:
- Do not mutate production data without approval; avoid context drift; do not bypass governance checks.

Recommended Agent Operating Model

The lakehouse agent roster supports cross-functional collaboration with clear decision boundaries and escalation paths.

Planner: decides scope, constraints, and success criteria; communicates to all agents.
Architect: validates design against governance; approves architecture handoffs.
Ingest/ETL Implementer: executes pipelines; reports validation results to Auditor.
Quality Auditor: ensures data quality, lineage, and policy alignment; triggers fixes or rollbacks.
Security Auditor: ensures security controls and secrets handling; enforces production readiness.
Data Steward: maintains metadata catalog and governance artifacts; supports reproducibility.
Orchestrator: coordinates all agents, memory, and escalations; enforces handoffs and gates.

Recommended Project Structure

lakehouse-design/
  agents/
    planner/
    architect/
    ingest-implementer/
    quality-auditor/
    security-auditor/
    data-steward/
  pipelines/
  catalogs/
  governance/
  tests/
  docs/
  docker/
  ci/

Core Operating Principles

Single source of truth and versioned artifacts.
Deterministic outputs with explicit memory propagation.
Clear decision boundaries and defined escalation paths.
Idempotent actions and auditable changes.

Agent Handoff and Collaboration Rules

Rules for planner, architect, implementer, reviewer, researcher, and domain specialist agents:

Planner hands off to Architect only after architecture objectives are approved.
Architect hands off to Implementer when models, specs, and interfaces are defined.
Implementer hands off to Quality Auditor after integration tests pass baseline checks.
Reviewer validates outputs against requirements; any deviation must be escalated.
Researcher provides data sources or experiments; domain specialist confirms domain alignment.

Tool Governance and Permission Rules

All commands require the orchestrator approval; avoid running in production without approval.
Code edits and deployments must be tracked; secrets never embedded in code.
External API calls require authentication and rate limits; errors escalate appropriately.
Production systems are accessed through approved principals; zero-trust posture enforced.
Approval gates trigger human review for schema changes or data policy changes.

Code Construction Rules

Write idempotent, modular components; avoid global mutable state.
Validate input schemas before processing; validate outputs before handoff.
Use versioned artifacts and maintain changelogs for every release.
Document API contracts and data contracts in the neighbor AGENTS.md.

Security and Production Rules

Enforce encryption at rest and in transit; rotate credentials.
Limit exposure of secrets; use vaults and ephemeral tokens.
Implement anomaly detection and alerting for production workloads.
Require sign-off from security before production deployments.

Testing Checklist

Unit tests for each agent function; integration tests for handoffs.
End-to-end tests simulating lakehouse data flow and governance checks.
Smoke tests post-deployment; performance tests for ETL throughput.

Common Mistakes to Avoid

Skipping governance checks in favor of speed.
Overloading agents with non-domain tasks; unclear ownership.
Bypassing memory or source-of-truth rules; drifting architecture.
Unclear escalation paths or missing audit trails.

FAQ

What is the Lakehouse Architecture AGENTS.md Template for?

It defines a standard operating context for AI coding agents designing and validating lakehouse architectures, enabling single-agent and multi-agent orchestration with clear handoffs and governance.

Who are typical agents in this lakehouse roster?

Planner, Architect, Ingest/ETL Implementer, Quality Auditor, Security Auditor, Data Steward, and a Lakehouse Orchestrator.

How do agent handoffs work in this template?

Handoffs occur at predefined checkpoints (e.g., Planner->Architect, Architect->Implementer, Implementer->Auditor) with explicit acceptance criteria and evidence artifacts.

What about tool governance and secrets?

Access controls, secret management, and approval gates are enforced; no production actions without orchestrator authorization.

How is memory and the source of truth maintained?

Decisions and artifacts are stored in a central knowledge store; the canonical data and metadata catalogs serve as the source of truth.

Target User

Use Cases