Model Serving Architecture AGENTS.md Template | Suhas Bhairav

Overview

This AGENTS.md template documents the operating manual for AI coding agents within a model serving architecture. It governs single agent and multi-agent orchestration, enabling clear responsibilities, handoffs, and governance across the inference pipeline.

When to Use This AGENTS.md Template

You are deploying or evolving a model serving architecture with more than one agent involved.
You need a single source of truth for roles, handoffs, and tool governance.
You require consistent validation, security, and escalation paths during deployment and monitoring.

Copyable AGENTS.md Template

# AGENTS.md
Project Role: Platform Owner defines the model serving architecture and agent workflow

Agent roster and responsibilities:
- Planner: orchestrates tasks, creates the plan for serving architecture
- Implementer: implements the plan in code, adapters, and endpoints
- Reviewer: validates outputs, policy alignment, and results
- Tester: executes test scenarios and checks for regressions
- Researcher: gathers evidence for decisions and model choices
- Domain Specialist: provides domain constraints and domain-specific guardrails

Supervisor or orchestrator behavior:
- The Orchestrator coordinates planning, execution, and handoffs
- Maintains memory and source-of-truth state across agents
- Enforces policy, security, and escalation gates

Handoff rules between agents:
- Planner to Implementer: pass plan, constraints, and required artifacts
- Implementer to Reviewer: deliver code, tests, and results for review
- Reviewer to Tester: approve test cases and acceptance criteria
- Tester to Planner: pass test results and risk signals for remediation
- Researcher/Domain Specialist to Planner: deliver evidence and domain constraints

Context, memory, and source-of-truth rules:
- Memory stores decisions, context, and outcomes for the milestone
- Source of truth includes model registry, data catalog, config repo, and audit logs
- Context must be versioned with each milestone

Tool access and permission rules:
- Tools: model_server, data_api, registry, monitoring, and secret manager
- Access: role-based; tokens are ephemeral
- No hard-coded credentials; secrets retrieved at runtime via the orchestrator

Architecture rules:
- Event-driven, modular microservices; idempotent actions
- Central orchestration with clear interfaces
- Observability and auditability required

File structure rules:
- All artifacts live under /src, /configs, /docs, /tests
- Naming follows kebab-case with clear prefixes for agents

Data, API, or integration rules:
- Versioned data schemas; input/output validation; rate limits
- Data provenance and lineage preserved

Validation rules:
- Acceptance criteria for each milestone; verifiable via tests and logs
- Outputs must be deterministic given the same inputs

Security rules:
- Encrypt data at rest and in transit; access control enforced
- Secrets management via a secure store; avoid leaking keys

Testing rules:
- Unit tests per agent, integration tests across agents, regression tests
- End-to-end tests for end-user workflows

Deployment rules:
- Blue/Green or canary deployment for model updates
- Rollback plan and feature flags

Human review and escalation rules:
- Human review triggers on policy violations or risk signals
- Escalation path to domain expert and security review when needed

Failure handling and rollback rules:
- If a step fails, rollback to the last known-good state and notify humans
- Preserve failing artifacts for post-mortem

Things Agents must not do:
- Do not bypass orchestrator or security checks
- Do not access production data directly without approval
- Do not perform unsanctioned changes to models or data

Recommended Agent Operating Model

The agent operating model defines Planner as the orchestration lead, Implementer as the coding executor, Reviewer as the quality gate, Tester for validation, Researcher for evidence, and Domain Specialist for constraints. Escalation paths and decision boundaries are explicit to prevent context drift and ensure safe changes.

Recommended Project Structure

/src
  /serving
    /models
    /inference
    /adapters
  /orchestrator
  /agents
    /planner
    /implementer
    /reviewer
    /tester
    /researcher
    /domain-specialist
  /configs
  /data
  /docs
  /tests
  /deploy
  /monitoring
  /security

Core Operating Principles

Single source of truth for decisions and artifacts
Idempotent operations and deterministic outputs
Explicit handoffs with preserved context
Strong observability and auditability
Secure by default and least-privilege tool access
Continuous improvement through post-mortems

Agent Handoff and Collaboration Rules

Planner announces plan, constraints, and acceptance criteria to Implementer
Implementer reports progress, tests, and any blockers to Reviewer
Reviewer validates, then passes to Tester with acceptance signals
Tester executes end-to-end checks and signals framework readiness
Researcher and Domain Specialist provide updates to Planner as needed

Tool Governance and Permission Rules

Commands must pass policy checks before execution
Edits to code and configs require orchestration approval
Secrets and credentials must be retrieved from a secure store at runtime
Production system changes require a formal approval gate

Code Construction Rules

Code must be modular, well-documented, and testable
No hard-coded environment secrets
All inputs validated; outputs and side effects are explicit
Idempotent deployable artifacts

Security and Production Rules

Data at rest and in transit encrypted
Access control lists and RBAC enforced
Audit logs for all model and config changes
PII redaction and privacy-by-design

Testing Checklist

Unit tests for each agent role
Integration tests across Planner, Implementer, Reviewer, and Tester
End-to-end tests of a full model serving inference flow
Deployment and rollback tests

Common Mistakes to Avoid

Skipping explicit handoffs and memory updates
Bypassing the orchestrator or secret manager
Overloading agents with non-domain tasks
Ignoring data provenance and versioning

FAQ

How does the AGENTS.md template enforce multi-agent handoffs?

The template documents dedicated roles and explicit handoff steps between Planner, Implementer, Reviewer, Tester, Researcher, and Domain Specialist, including context propagation and decision logging.

What governance rules ensure safe tool usage?

Tool usage is constrained by role-based access, ephemeral credentials, secret storage, and formal approval gates for production changes.

How can I adapt this template to a different model serving stack?

Adjust the agent roster, project structure, and data/API rules to match your stack while preserving the handoff patterns and governance model.

What happens on failure or rollback?

On failure, the orchestrator triggers rollback to the last safe model version and notifies human teammates for review.

How is memory and source-of-truth managed?

Context and decisions are stored in a central memory store; the source-of-truth includes the model registry, data catalog, and config repository with versioning.

AGENTS.md Template for Model Serving Architecture

Target User

Use Cases