AGENTS.md Templatestemplate

BigQuery Production Architecture AGENTS.md Template

AGENTS.md Template for BigQuery production architecture detailing multi-agent orchestration, handoffs, and governance for reliable data pipelines.

AGENTS.md TemplateBigQueryAI coding agentsmulti-agent orchestrationagent handoff rulesBigQuery production architecturedata governancetool governancehuman reviewdeployment rulesdata quality checks

Target User

Developers, data engineers, platform architects, AI leaders

Use Cases

  • Orchestrate single-agent and multi-agent data pipelines in BigQuery
  • Define governance and handoff rules for data workflows
  • Provide a repeatable operating manual for production data pipelines

Markdown Template

BigQuery Production Architecture AGENTS.md Template

# AGENTS.md

Project role
- BigQuery Production Architect: defines the end-to-end data flow, governance, and production readiness criteria.

Agent roster and responsibilities
- Planner: designs the orchestration plan, sources, and state transitions; creates run context for each cycle.
- Ingestor: pulls raw data into staging tables in BigQuery; enforces schema discipline and idempotency.
- Validator: performs schema validation, data quality checks, and anomaly detection; flags issues to halt the pipeline if needed.
- Transformer: applies transformations, joins, and enrichment using defined semantic layers.
- Loader: materializes curated and analytics-ready tables; updates marts and views.
- Reviewer: conducts change reviews, validates outputs, and approves deployment to prod.
- Regressor/TestAgent: runs regression tests and data quality dashboards on a scheduled basis.
- Maintainer: ensures dependencies, versioning, and configs stay in sync with production standards.

Supervisor or orchestrator behavior
- The Orchestrator monitors agent states, enforces run ordering, propagates context, and triggers handoffs on completion.
- It enforces memory/state consistency, stores last run identifiers, and blocks unsanctioned changes to production datasets.
- It logs decisions and outcomes for auditability.

Handoff rules between agents
- On successful ingestion, pass run context and data pointers to Validator.
- On validation pass, pass to Transformer, then to Loader.
- If any agent fails, escalate to Reviewer and pause downstream steps until remediation is complete.
- All context must be stored in a central state store and not reconstructed ad-hoc.

Context, memory, and source-of-truth rules
- Source of Truth: BigQuery curated dataset and semantic layer.
- Context memory: a per-run state document in a centralized store; include runId, timestamps, data pointers, schema versions, and lineage pointers.
- No blind reuse of historical contexts without validation; memory must be versioned.

Tool access and permission rules
- Each agent runs with least-privilege service accounts scoped to required datasets, tables, and APIs.
- Secret access is via a centralized vault; rotate keys and restrict exposure.
- Production tools require approval gates for changes.

Architecture rules
- Layered data architecture: raw & staging → curated → analytics marts.
- Partitioning (date-based) and clustering on common query patterns; use view-based semantic layer.
- Idempotent operations and clear rollback paths.

File structure rules
- Keep production configs in a dedicated repo/dir; datasets, schemas, and pipelines must be versioned.
- Do not store secrets in code; use secret managers.

Data, API, or integration rules when relevant
- Data sources must be documented; all APIs must be authenticated and logged.
- Enforce data contracts and API rate limits.

Validation rules
- Every run must pass data quality checks before loading to curated marts.
- Maintain dashboards for data freshness and error rates.

Security rules
- Enforce least privilege, encryption at rest, in transit; secrets never in code.
- Access reviews for production datasets; implement boundary controls for sensitive data.

Testing rules
- Unit tests for transforms, integration tests for data sources, and end-to-end tests for the pipeline.
- CI checks for schema compatibility and version alignment.

Deployment rules
- Changes gated by PR reviews; production changes require validation passes and rollbacks defined.
- Blue-green or canary deployments for critical data products.

Human review and escalation rules
- Human review triggers when quality thresholds are not met or when schema drift is detected.
- Escalations to data stewards or governance boards if issues persist.

Failure handling and rollback rules
- On failure, halt downstream steps and revert to last known good state; preserve lineage and run history.
- Maintain rollback scripts and data snapshots for critical datasets.

Things Agents must not do
- Do not bypass handoffs or memory state; do not mutate source data without validation.
- Do not mutate production schemas without regression tests and approvals.
- Do not operate outside scoped datasets or without required permissions.

Overview

Direct answer: This AGENTS.md template governs the operating model for BigQuery production data pipelines using AI coding agents and multi-agent orchestration. It defines roles, handoffs, memory and source-of-truth rules, tool governance, and escalation paths to keep production data flows auditable, secure, and maintainable for both single-agent and multi-agent workflows.

When to Use This AGENTS.md Template

  • When designing repeatable BigQuery production pipelines with AI coding agents.
  • When you need clear handoff rules between planner, ingester, validator, transformer, and loader agents.
  • When establishing governance, security, and rollback procedures for production changes.
  • When facilitating multi-agent collaboration with a centralized orchestrator or supervisor.

Copyable AGENTS.md Template

# AGENTS.md

Project role
- BigQuery Production Architect: defines the end-to-end data flow, governance, and production readiness criteria.

Agent roster and responsibilities
- Planner: designs the orchestration plan, sources, and state transitions; creates run context for each cycle.
- Ingestor: pulls raw data into staging tables in BigQuery; enforces schema discipline and idempotency.
- Validator: performs schema validation, data quality checks, and anomaly detection; flags issues to halt the pipeline if needed.
- Transformer: applies transformations, joins, and enrichment using defined semantic layers.
- Loader: materializes curated and analytics-ready tables; updates marts and views.
- Reviewer: conducts change reviews, validates outputs, and approves deployment to prod.
- Regressor/TestAgent: runs regression tests and data quality dashboards on a scheduled basis.
- Maintainer: ensures dependencies, versioning, and configs stay in sync with production standards.

Supervisor or orchestrator behavior
- The Orchestrator monitors agent states, enforces run ordering, propagates context, and triggers handoffs on completion.
- It enforces memory/state consistency, stores last run identifiers, and blocks unsanctioned changes to production datasets.
- It logs decisions and outcomes for auditability.

Handoff rules between agents
- On successful ingestion, pass run context and data pointers to Validator.
- On validation pass, pass to Transformer, then to Loader.
- If any agent fails, escalate to Reviewer and pause downstream steps until remediation is complete.
- All context must be stored in a central state store and not reconstructed ad-hoc.

Context, memory, and source-of-truth rules
- Source of Truth: BigQuery curated dataset and semantic layer.
- Context memory: a per-run state document in a centralized store; include runId, timestamps, data pointers, schema versions, and lineage pointers.
- No blind reuse of historical contexts without validation; memory must be versioned.

Tool access and permission rules
- Each agent runs with least-privilege service accounts scoped to required datasets, tables, and APIs.
- Secret access is via a centralized vault; rotate keys and restrict exposure.
- Production tools require approval gates for changes.

Architecture rules
- Layered data architecture: raw & staging → curated → analytics marts.
- Partitioning (date-based) and clustering on common query patterns; use view-based semantic layer.
- Idempotent operations and clear rollback paths.

File structure rules
- Keep production configs in a dedicated repo/dir; datasets, schemas, and pipelines must be versioned.
- Do not store secrets in code; use secret managers.

Data, API, or integration rules when relevant
- Data sources must be documented; all APIs must be authenticated and logged.
- Enforce data contracts and API rate limits.

Validation rules
- Every run must pass data quality checks before loading to curated marts.
- Maintain dashboards for data freshness and error rates.

Security rules
- Enforce least privilege, encryption at rest, in transit; secrets never in code.
- Access reviews for production datasets; implement boundary controls for sensitive data.

Testing rules
- Unit tests for transforms, integration tests for data sources, and end-to-end tests for the pipeline.
- CI checks for schema compatibility and version alignment.

Deployment rules
- Changes gated by PR reviews; production changes require validation passes and rollbacks defined.
- Blue-green or canary deployments for critical data products.

Human review and escalation rules
- Human review triggers when quality thresholds are not met or when schema drift is detected.
- Escalations to data stewards or governance boards if issues persist.

Failure handling and rollback rules
- On failure, halt downstream steps and revert to last known good state; preserve lineage and run history.
- Maintain rollback scripts and data snapshots for critical datasets.

Things Agents must not do
- Do not bypass handoffs or memory state; do not mutate source data without validation.
- Do not mutate production schemas without regression tests and approvals.
- Do not operate outside scoped datasets or without required permissions.

Recommended Agent Operating Model

The operating model assigns clear roles with decision boundaries and escalation paths for BigQuery production workflows. Planner and Ingestor drive data into staging; Validator enforces contracts and quality; Transformer and Loader compose final datasets; Reviewer and Maintainer govern deployment; Human reviewers participate in governance for any deviations or high-risk changes. Escalation goes to data stewards when automated checks fail consistently.

Recommended Project Structure

bigquery-prod-architecture/
├── configs/
│   ├── prod.yaml
│   └── schema.json
├── data/
│   ├── raw/
│   ├── staging/
│   ├── curated/
│   └── marts/
├── pipelines/
│   ├── planner.md
│   ├── ingestor.md
│   ├── validator.md
│   ├── transformer.md
│   └── loader.md
├── tests/
│   ├── unit/
│   ├── integration/
│   └── e2e/
├── docs/
│   └── runbooks.md
└── scripts/
    ├── deploy.sh
    └── rollback.sh

Core Operating Principles

  • Single source of truth for production data; all state is versioned.
  • Explicit handoffs with context in the shared state store.
  • Least privilege and auditable actions for all agents.
  • Deterministic, idempotent operations with clear rollback paths.
  • Continuous validation and governance checks before production deploy.

Agent Handoff and Collaboration Rules

  • Planner hands off to Ingestor with run context and data pointers.
  • Ingestor validates schema before passing to Validator.
  • Validator flags issues; if pass, passes to Transformer; else escalates to Reviewer.
  • Transformer completes transformations and passes to Loader; Reviewer can approve enhancements before prod.
  • Researchers or domain specialists may provide enrichment inputs via a controlled channel.

Tool Governance and Permission Rules

  • Use service accounts with scoped permissions to BigQuery datasets and Cloud resources.
  • Secrets stored in vault; never embedded in code or configs.
  • All API calls are logged; access requires approvals for production changes.

Code Construction Rules

  • Transform logic must be idempotent and auditable.
  • Schema evolution must be controlled; new fields must pass compatibility tests.
  • Code changes require review, tests, and data quality validation before prod.

Security and Production Rules

  • Encrypt data at rest and in transit; restrict access by role.
  • Audit trails for all data movements and transformations.
  • Regular secret rotations and access reviews for production resources.

Testing Checklist

  • Unit tests for each transform; integration tests for data sources.
  • End-to-end tests verifying ingestion to curated marts and dashboards.
  • Production readiness checks for schemas, partitions, and view definitions.

Common Mistakes to Avoid

  • Skipping data quality checks; skipping versioned state; skipping approvals.
  • Bypassing handoffs or modifying production data directly without validation.
  • Ignoring schema drift and lack of rollback strategies.

Related implementation resources: AI Use Case for Coding Bootcamps Using Github To Auto-Grade Student Coding Submissions and Provide Immediate Feedback and AI Use Case for Micro-Lenders Using Phone Usage Data Metrics To Evaluate Creditworthiness In Unbanked Regions.

FAQ

What is the purpose of this AGENTS.md Template for BigQuery production architecture?

This AGENTS.md Template establishes a repeatable, auditable operating manual for planning, executing, and governing BigQuery production data pipelines using AI coding agents and multi-agent orchestration.

How should memory and context be shared between agents?

All per-run state is stored in a centralized, versioned memory store. Agents read and write to this state to ensure consistency and traceability across handoffs.

Who handles human review and escalation?

Reviewer or data steward handles escalation when automated checks fail; persistent issues escalate to governance or the data team lead.

What are the security rules for secrets and production access?

Secrets must be managed in a vault, access is least-privilege, and production resources require approval gates and audit logging.

How do you validate data quality before loading into production?

Validators run data quality checks against contracts, schema, and sample data; only data that passes proceeds to curated marts.