HPA and VPA Production Reviews AGENTS.md Template
AGENTS.md Template for HPA and VPA production reviews, providing a copyable operating manual for AI coding agents and multi-agent orchestration.
Target User
Engineering leaders, platform teams, SREs, DevOps
Use Cases
- Establishing production review workflows for HPA and VPA
- Multi-agent orchestration of autoscaling decisions
- Governance and handoffs in Kubernetes autoscaling
Markdown Template
HPA and VPA Production Reviews AGENTS.md Template
# AGENTS.md
Project: HPA and VPA Production Review (Autoscaling Governance)
Agent roster and responsibilities
- Planner (Lead): defines objectives for the review cycle, collects metrics, schedules reviews, and compiles artifacts.
- HPA_VPA Specialist: designs, tunes, and validates horizontal and vertical autoscaler policies.
- Implementer: translates policies into Kubernetes manifests, Helm charts, or GitOps configurations.
- Reviewer: ensures alignment with governance, security, and architectural constraints.
- Tester: validates changes in staging, simulating traffic patterns and scale events.
- Monitor: validates post-deployment behavior and triggers rollbacks if thresholds breach.
Supervisor or orchestrator behavior
- The Orchestrator coordinates planning, implementation, review, testing, and deployment steps.
- It enforces policy-driven handoffs, timeouts, and mandatory artifacts before progression.
- It records decisions in a central knowledge store and emits a decision log for traceability.
Handoff rules between agents
- Planner → Implementer: artifacts required include policy specs, metric baselines, and proposed config changes.
- Implementer → Reviewer: provide diffs, PRs, validation tests, and rollback plan.
- Reviewer → Tester: require passing integration tests and staged metrics.
- Tester → Monitor: approve production candidate with runbooks and monitoring dashboards.
- Monitor → Planner: alert on drift or failure; trigger rollback if needed.
Context, memory, and source-of-truth rules
- All decisions originate from a single source of truth (Git repository and CI/CD state).
- Memory is persisted in a knowledge store with traceable inputs, decisions, and outputs.
- Source-of-truth includes metrics sources (Prometheus, Kubernetes metrics API), policy docs, and change records.
Tool access and permission rules
- Tools include kubectl, Helm, GitOps tooling, and cloud provider APIs. Access is role-based and secrets are stored in a vault.
- Implementer and Planner can propose changes; Only Reviewer and above can approve PRs to production.
- Secrets must not be hard-coded; rotate credentials and use short-lived tokens.
Architecture rules
- Changes must be applied through GitOps pipelines; production changes require PR reviews and automated tests.
- Avoid ad-hoc changes in production; validate in staging first.
- Use canary or blue/green deployment strategies for autoscaler changes.
File structure rules
- Keep all autoscaler changes under configs/ and manifests/ with clear naming.
- Do not place unrelated files in the same folder; maintain separation by responsibility.
Data, API, or integration rules when relevant
- Pull metrics from Prometheus and Kubernetes Metrics API; respect scrape intervals and data retention.
- Do not bypass metrics with synthetic tests that could bias decisions.
Validation rules
- Validate scaling decisions against baselines, capacity constraints, and QoS targets.
- Ensure change artifacts have passing tests, and metrics show no regression in latency or throughput.
Security rules
- Enforce RBAC for all agents; limit production access; require approvals for changes.
- Do not expose credentials in logs or files; redact secrets in outputs.
Testing rules
- Unit tests for policy logic, integration tests for policy-to-manifest translation, and end-to-end tests of the full review cycle.
Deployment rules
- Deploy changes only via CI/CD with gating, approvals, and canary validations.
- Maintain rollback procedures to revert to the prior policy state.
Human review and escalation rules
- Any non-trivial change must be reviewed by a human reviewer with domain expertise.
- Escalate to SRE/Platform Owner if drift or risk is detected beyond defined thresholds.
Failure handling and rollback rules
- If a policy causes adverse scaling (e.g., resource thrash), revert to the prior stable policy and run a post-mortem.
- Always have a rollback plan with explicit commands and canary steps.
Things Agents must not do
- Do not bypass approvals to push changes directly to production.
- Do not implement non-governed changes or hidden optimizations.
- Do not rely on stale metrics; always validate with fresh data before acting.Overview
This AGENTS.md Template governs the HPA and VPA production review workflow for AI coding agents. It supports both single-agent and multi-agent orchestration, with explicit roles, handoffs, and governance to ensure auditable, safe autoscaling decisions in production. Direct answer: this template provides a concrete operating manual that multidisciplinary agents can follow in real-time scaling reviews.
When to Use This AGENTS.md Template
- When implementing HPA and VPA policies in production environments.
- When you need clear agent roles, responsibilities, and handoffs for autoscaler decisions.
- When governance, security, and auditability are required for scaling changes.
- When using multi-agent orchestration to coordinate metrics, policies, and actions.
- When you want a copyable, project-level AGENTS.md template that can be dropped into a repo.
Copyable AGENTS.md Template
# AGENTS.md
Project: HPA and VPA Production Review (Autoscaling Governance)
Agent roster and responsibilities
- Planner (Lead): defines objectives for the review cycle, collects metrics, schedules reviews, and compiles artifacts.
- HPA_VPA Specialist: designs, tunes, and validates horizontal and vertical autoscaler policies.
- Implementer: translates policies into Kubernetes manifests, Helm charts, or GitOps configurations.
- Reviewer: ensures alignment with governance, security, and architectural constraints.
- Tester: validates changes in staging, simulating traffic patterns and scale events.
- Monitor: validates post-deployment behavior and triggers rollbacks if thresholds breach.
Supervisor or orchestrator behavior
- The Orchestrator coordinates planning, implementation, review, testing, and deployment steps.
- It enforces policy-driven handoffs, timeouts, and mandatory artifacts before progression.
- It records decisions in a central knowledge store and emits a decision log for traceability.
Handoff rules between agents
- Planner → Implementer: artifacts required include policy specs, metric baselines, and proposed config changes.
- Implementer → Reviewer: provide diffs, PRs, validation tests, and rollback plan.
- Reviewer → Tester: require passing integration tests and staged metrics.
- Tester → Monitor: approve production candidate with runbooks and monitoring dashboards.
- Monitor → Planner: alert on drift or failure; trigger rollback if needed.
Context, memory, and source-of-truth rules
- All decisions originate from a single source of truth (Git repository and CI/CD state).
- Memory is persisted in a knowledge store with traceable inputs, decisions, and outputs.
- Source-of-truth includes metrics sources (Prometheus, Kubernetes metrics API), policy docs, and change records.
Tool access and permission rules
- Tools include kubectl, Helm, GitOps tooling, and cloud provider APIs. Access is role-based and secrets are stored in a vault.
- Implementer and Planner can propose changes; Only Reviewer and above can approve PRs to production.
- Secrets must not be hard-coded; rotate credentials and use short-lived tokens.
Architecture rules
- Changes must be applied through GitOps pipelines; production changes require PR reviews and automated tests.
- Avoid ad-hoc changes in production; validate in staging first.
- Use canary or blue/green deployment strategies for autoscaler changes.
File structure rules
- Keep all autoscaler changes under configs/ and manifests/ with clear naming.
- Do not place unrelated files in the same folder; maintain separation by responsibility.
Data, API, or integration rules when relevant
- Pull metrics from Prometheus and Kubernetes Metrics API; respect scrape intervals and data retention.
- Do not bypass metrics with synthetic tests that could bias decisions.
Validation rules
- Validate scaling decisions against baselines, capacity constraints, and QoS targets.
- Ensure change artifacts have passing tests, and metrics show no regression in latency or throughput.
Security rules
- Enforce RBAC for all agents; limit production access; require approvals for changes.
- Do not expose credentials in logs or files; redact secrets in outputs.
Testing rules
- Unit tests for policy logic, integration tests for policy-to-manifest translation, and end-to-end tests of the full review cycle.
Deployment rules
- Deploy changes only via CI/CD with gating, approvals, and canary validations.
- Maintain rollback procedures to revert to the prior policy state.
Human review and escalation rules
- Any non-trivial change must be reviewed by a human reviewer with domain expertise.
- Escalate to SRE/Platform Owner if drift or risk is detected beyond defined thresholds.
Failure handling and rollback rules
- If a policy causes adverse scaling (e.g., resource thrash), revert to the prior stable policy and run a post-mortem.
- Always have a rollback plan with explicit commands and canary steps.
Things Agents must not do
- Do not bypass approvals to push changes directly to production.
- Do not implement non-governed changes or hidden optimizations.
- Do not rely on stale metrics; always validate with fresh data before acting.
Recommended Agent Operating Model
The model defines how the agents work together: Planner sets direction, policy experts craft constraints, Implementers realize changes, Reviewers verify governance and security, and Monitor validates outcomes post-deploy. Handoffs follow a strict state machine to prevent drift, and escalation paths exist for high-risk changes. This AGENTS.md Template for HPA and VPA production reviews ensures multi-agent coordination and human-in-the-loop governance.
Recommended Project Structure
workflows/
hpa-vpa-prod-review/
01-planner/
02-implementer/
03-reviewer/
04-tester/
05-monitor/
artifacts/
configs/
docs/
Core Operating Principles
- Single source of truth for decisions (Git + CI/CD).
- Deterministic, auditable outcomes with explicit provenance.
- Clear, enforceable handoffs and decision boundaries.
- Defensive defaults and guardrails to prevent unsafe scaling.
- Human review for high-impact changes; audit trails for all decisions.
Agent Handoff and Collaboration Rules
- Planner coordinates objectives and metrics with the HPA_VPA Specialist.
- HPA_VPA Specialist translates policy into deployable changes and drafts validation tests.
- Implementer submits artifact PRs; Reviewer validates governance and security constraints.
- Tester runs staging simulations; Monitor validates production readiness post-deploy.
- In case of drift, Monitor alerts Planner; Planner initiates reevaluation or rollback.
Tool Governance and Permission Rules
- Commands and edits restricted to approved roles and directories.
- Secrets stored in a vault; never exposed in logs or code.
- Production changes require PR approvals and automated tests.
- External API calls require whitelisting and token scopes aligned to the task.
- All actions are tracked in the decision log for traceability.
Code Construction Rules
- Idempotent changes; no side effects without explicit guards.
- Configuration manifests derived from policy inputs; avoid hard-coded values.
- Tests must cover policy-to-manifest translation and scaling outcomes.
- PRs should include a rollback plan and canary test results.
Security and Production Rules
- RBAC ensures least privilege across all agents.
- Canary deployments for autoscaler policy changes; monitor edge cases before full rollout.
- Secrets rotation and secure storage; no plaintext credentials in code or logs.
Testing Checklist
- Unit tests for policy logic; integration tests for policy translation.
- End-to-end tests that simulate scaling events in staging.
- Canary validation with monitored metrics and alerting baselines.
- Rollbacks tested and documented in the runbook.
Common Mistakes to Avoid
- Skipping human review for risky autoscaler changes.
- Relying on stale metrics or hidden assumptions.
- Bypassing the GitOps pathway for production changes.
- Overly permissive tool access or unscoped secrets.
Related implementation resources: AI Use Case for Sales Pipeline Reviews and Deal Risk Scoring and AI Use Case for Content Marketers Using Wordpress To Auto-Translate Blog Posts Into Multiple Languages.
FAQ
What is the purpose of this AGENTS.md template for HPA and VPA production reviews?
It defines roles, handoffs, and governance for autoscaler decision workflows in both single-agent and multi-agent setups. This is an AGENTS.md template built for production reviews.
Who are the agents and what are their responsibilities?
Planner, HPA_VPA Specialist, Implementer, Reviewer, Tester, and Monitor with clearly defined responsibilities within the multi-agent orchestration pattern.
How are handoffs and decision boundaries enforced?
Handoff rules, timeouts, and required artifacts are enforced by the Orchestrator, with approvals gated by governance policy. This ensures do-not-skip steps in the AGENTS.md Template.
How is security and production access managed?
Secrets are stored securely, RBAC is enforced, and production changes require validation, code review, and automated tests before rollout.
How do you validate scaling decisions and rollback if needed?
Through staged tests, monitoring baselines, and a defined rollback path that reverts to the prior policy state if outcomes drift beyond thresholds.