AGENTS.md TemplatesAGENTS.md Template for Online Inference System

AGENTS.md Template for Online Inference System Design

Copyable AGENTS.md Template for online inference system design with multi-agent orchestration, handoffs, tool governance, and human review.

AGENTS.md templateAI coding agentsonline inferencemulti-agent orchestrationagent handoff rulestool governancehuman reviewinference pipelinearchitecture rulessecurity rulestesting checksdeployment rules

Target User

Engineering leaders, platform teams, AI/ML engineers

Use Cases

  • Single-agent workflow for inference task execution
  • Multi-agent orchestration for model serving, data preprocessing, and monitoring
  • Governance of tool access, secrets, and production deployments
  • Project-level operating context for agent-driven workflows

Markdown Template

AGENTS.md Template for Online Inference System Design

# AGENTS.md

Project role: Platform Lead for online inference system

Agent roster and responsibilities:
- Planner: designs the plan and sequence of inference steps, data flows, and model usage.
- Implementer: converts the plan into code and config, ensures reproducibility and idempotence.
- Researcher: identifies data sources, model specs, and external signals; curates sources.
- ModelRunner: runs the model inferences with proper context and memory, ensures deterministic outputs.
- Monitor: collects metrics, detects drift, and flags anomalies.
- Reviewer: checks correctness, security posture, and quality; approves changes for staging.
- Tester: validates end-to-end behavior with test data; confirms outputs meet acceptance criteria.
- DataEngineer: manages data ingestion, normalization, feature extraction, and storage.
- Domain Specialist: provides domain constraints and validates results against domain requirements.

Supervisor or orchestrator behavior:
- Orchestrator coordinates plans, supports cross-agent handoffs, enforces gates, and stores memory.
- The planner submits plan before execution; the orchestrator validates preconditions.
- Handoffs follow a fixed protocol with confirmation at each boundary.

Handoff rules between agents:
- Planner -> Implementer: hand off plan, context, and acceptance criteria.
- Implementer -> Reviewer: hand off implementation, tests, and quality signals.
- Reviewer -> Tester: hand off validated artifacts and test results.
- Researcher -> Planner: inject external signals or alternative data sources when needed.
- Domain Specialist -> Planner or Orchestrator: provide domain constraints when ambiguity arises.

Context, memory, and source-of-truth rules:
- Use a central memory store to persist plan, data lineage, and decision rationale.
- All outputs, prompts, and tool results must reference a source-of-truth document or data source.
- Context must be versioned; prior context is retained for audit.

Tool access and permission rules:
- Agents may invoke tools ( APIs, shells, code runners ) only within defined capabilities.
- Secrets must be retrieved from a vault; never stored in code or logs.
- Production endpoints require approval, tokens, and rotate regularly.
- External services require access controls and audit logging.

Architecture rules:
- Microservice-like boundaries; clear interfaces between Planner, Implementer, and ModelRunner.
- All in-flight decisions must be traceable to an artifact in the memory store.
- No silent data leakage; privacy by design.

File structure rules:
- Keep all agent artifacts under a single project root.
- Use explicit directories for plans, implementations, tests, and monitors.

Data, API, or integration rules:
- Define input schemas for each integration; validate at boundary.
- Use deterministic seeds for experiments; log randomness responsibly.
- Include data provenance where possible.

Validation rules:
- All outputs must pass defined acceptance tests before promotion.
- Logs and metrics must be emitted for observability.

Security rules:
- Do not expose secrets in code or prompts.
- Enforce least privilege and rotation; monitor for anomalous access.
- Compliance with data handling and regulatory requirements.

Testing rules:
- Unit tests for each agent; integration tests for the full inference pipeline.
- End-to-end tests with simulated real-world data.
- Regression tests on model drift scenarios.

Deployment rules:
- Deploy to staging with feature flags; monitor before production.
- Rollback path clearly defined; immutable deployments preferred.

Human review and escalation rules:
- Trigger human review on drift, failure, or high-risk outputs.
- Escalate to Domain Specialist for domain-specific issues.
- Maintain an auditable escalation trail.

Failure handling and rollback rules:
- If a failure occurs, revert to previous known-good state; notify stakeholders.
- Pause automated actions if confidence is low; require human confirmation to resume.

Things Agents must not do:
- Do not bypass orchestrator controls or modify memory without trace.
- Do not share secrets; never log plaintext credentials.
- Do not perform unsanctioned changes to production models or endpoints.

Overview

The AGENTS.md template for online inference system design codifies a repeatable operating manual that governs both individual AI coding agents and multi-agent orchestration. It defines agent roles, handoffs, memory and truth sources, tool access, and governance needed to safely operate an inference pipeline across data ingestion, feature extraction, model serving, validation, monitoring, and human review.

Direct answer: Use this AGENTS.md template to establish a disciplined orchestration pattern where specialized agents collaborate to fetch data, run inferences, validate results, monitor drift, and escalate issues as needed.

When to Use This AGENTS.md Template

  • When building an online inference system that requires clear agent roles and explicit escalation paths.
  • When coordinating multiple agents across data ingestion, feature processing, model inference, validation, and observability.
  • When you need tool governance, secrets handling, and production-safe deployment rules.
  • When you want a copyable, project-scoped template to bootstrap governance quickly.

Copyable AGENTS.md Template

# AGENTS.md

Project role: Platform Lead for online inference system

Agent roster and responsibilities:
- Planner: designs the plan and sequence of inference steps, data flows, and model usage.
- Implementer: converts the plan into code and config, ensures reproducibility and idempotence.
- Researcher: identifies data sources, model specs, and external signals; curates sources.
- ModelRunner: runs the model inferences with proper context and memory, ensures deterministic outputs.
- Monitor: collects metrics, detects drift, and flags anomalies.
- Reviewer: checks correctness, security posture, and quality; approves changes for staging.
- Tester: validates end-to-end behavior with test data; confirms outputs meet acceptance criteria.
- DataEngineer: manages data ingestion, normalization, feature extraction, and storage.
- Domain Specialist: provides domain constraints and validates results against domain requirements.

Supervisor or orchestrator behavior:
- Orchestrator coordinates plans, supports cross-agent handoffs, enforces gates, and stores memory.
- The planner submits plan before execution; the orchestrator validates preconditions.
- Handoffs follow a fixed protocol with confirmation at each boundary.

Handoff rules between agents:
- Planner -> Implementer: hand off plan, context, and acceptance criteria.
- Implementer -> Reviewer: hand off implementation, tests, and quality signals.
- Reviewer -> Tester: hand off validated artifacts and test results.
- Researcher -> Planner: inject external signals or alternative data sources when needed.
- Domain Specialist -> Planner or Orchestrator: provide domain constraints when ambiguity arises.

Context, memory, and source-of-truth rules:
- Use a central memory store to persist plan, data lineage, and decision rationale.
- All outputs, prompts, and tool results must reference a source-of-truth document or data source.
- Context must be versioned; prior context is retained for audit.

Tool access and permission rules:
- Agents may invoke tools ( APIs, shells, code runners ) only within defined capabilities.
- Secrets must be retrieved from a vault; never stored in code or logs.
- Production endpoints require approval, tokens, and rotate regularly.
- External services require access controls and audit logging.

Architecture rules:
- Microservice-like boundaries; clear interfaces between Planner, Implementer, and ModelRunner.
- All in-flight decisions must be traceable to an artifact in the memory store.
- No silent data leakage; privacy by design.

File structure rules:
- Keep all agent artifacts under a single project root.
- Use explicit directories for plans, implementations, tests, and monitors.

Data, API, or integration rules:
- Define input schemas for each integration; validate at boundary.
- Use deterministic seeds for experiments; log randomness responsibly.
- Include data provenance where possible.

Validation rules:
- All outputs must pass defined acceptance tests before promotion.
- Logs and metrics must be emitted for observability.

Security rules:
- Do not expose secrets in code or prompts.
- Enforce least privilege and rotation; monitor for anomalous access.
- Compliance with data handling and regulatory requirements.

Testing rules:
- Unit tests for each agent; integration tests for the full inference pipeline.
- End-to-end tests with simulated real-world data.
- Regression tests on model drift scenarios.

Deployment rules:
- Deploy to staging with feature flags; monitor before production.
- Rollback path clearly defined; immutable deployments preferred.

Human review and escalation rules:
- Trigger human review on drift, failure, or high-risk outputs.
- Escalate to Domain Specialist for domain-specific issues.
- Maintain an auditable escalation trail.

Failure handling and rollback rules:
- If a failure occurs, revert to previous known-good state; notify stakeholders.
- Pause automated actions if confidence is low; require human confirmation to resume.

Things Agents must not do:
- Do not bypass orchestrator controls or modify memory without trace.
- Do not share secrets; never log plaintext credentials.
- Do not perform unsanctioned changes to production models or endpoints.

Recommended Agent Operating Model

Roles, responsibilities, decision boundaries, and escalation paths are defined to balance autonomy with governance. The Planner guides the plan; Implementer executes; Reviewers guard quality; Testers validate; Researchers supply data; Domain Specialists provide constraints; and the Orchestrator maintains overall integrity and memory.

Recommended Project Structure

project-root/
  orchestrator/
  agents/
    planner/
    implementer/
    researcher/
    reviewer/
    tester/
    data-engineer/
    monitor/
  models/
  data/
  pipelines/
  infra/
  config/
  docs/

Core Operating Principles

  • Clear ownership and accountability for each artifact.
  • Deterministic handoffs with explicit acceptance criteria.
  • End-to-end traceability from data input to output.
  • Idempotent actions and safe retries.
  • Observability and auditable decision rationales.
  • Human-in-the-loop for high-risk decisions or drift scenarios.

Agent Handoff and Collaboration Rules

  • Planner coordinates with Implementer and ModelRunner; provide plan artifacts and constraints.
  • Implementer exposes implementation artifacts and unit tests; ensures deterministic results.
  • Reviewer assesses code quality, security posture, and test results; approves for staging.
  • Tester runs end-to-end tests with representative data; reports results and residual risk.
  • Researcher supplies data sources and validation signals; collaborates with Planner for plan updates.
  • Domain Specialist validates domain constraints and thresholds; escalates when ambiguities arise.

Tool Governance and Permission Rules

  • Access to tools and endpoints is controlled; least privilege enforced.
  • Secrets retrieved from a vault; never embedded in code or prompts.
  • Promotion gates require approvals and audit trails.

Code Construction Rules

  • Follow project-specific contracts and schemas for inputs and outputs.
  • Avoid hardcoding values; use configurable parameters and environment variables.
  • Write clear, testable functions; minimize side effects; ensure idempotence.

Security and Production Rules

  • Encrypt sensitive data at rest and in transit.
  • Limit access with role-based controls; rotate keys regularly.
  • Monitor for anomalous activity; trigger adversarial testing where appropriate.

Testing Checklist

  • Unit tests for each agent and utility function.
  • Integration tests for interaction between Planner, Implementer, and ModelRunner.
  • End-to-end tests with simulated online inference workloads.
  • Deployment tests in staging with feature flags and canaries.

Common Mistakes to Avoid

  • Assuming a single-agent suffices for all inference tasks.
  • Undefined handoff criteria leading to drift between agents.
  • Bypassing memory or source-of-truth checks.
  • Publishing models or endpoints without security reviews.

Related implementation resources: AI Use Case for Sales Pipeline Reviews and Deal Risk Scoring and AI Agent Use Case for Wholesalers Using Multi-Currency Ledger Trackers To Calculate Foreign Exchange Risk Exposure Across Global Accounts.

FAQ

What is the purpose of this AGENTS.md Template?

It provides a copyable, project-scoped operating manual for single and multi-agent online inference workflows, including handoffs, governance, and escalation.

Who are the core agents in the online inference system design?

The core roster includes Planner, Implementer, Researcher, ModelRunner, Monitor, Reviewer, Tester, DataEngineer, and Domain Specialist, with an Orchestrator coordinating them.

How are handoffs between agents governed?

Handoffs follow a fixed protocol: Planner → Implementer; Implementer → Reviewer; Reviewer → Tester; Researcher → Planner; Domain Specialist → Planner/Orchestrator. Each step includes artifacts and acceptance criteria.

How does tool governance protect secrets?

Secrets are retrieved from a vault and never stored in code or logs; access is least-privileged and auditable.

How do you validate model outputs in multi-agent orchestration?

Validation includes unit tests, end-to-end tests with simulated workloads, monitoring for drift, and human review when risk signals appear.