Content Moderation AGENTS.md Template

Overview

Direct answer: This AGENTS.md Template defines a content moderation system design for AI coding agents, enabling both single agent and multi agent orchestration to enforce policies, analyze content, and escalate to human review when needed.

When to Use This AGENTS.md Template

You are building an AI powered moderation platform with data ingestion, policy interpretation, decisioning, and human review as a workflow.
You need a project level operating context that enforces tool governance memory rules and escalation paths.
You want a copyable AGENTS.md template to seed new moderation workflows and ensure guardrails.

Copyable AGENTS.md Template

# AGENTS.md
# Content Moderation AGENTS.md Template

Project: Content Moderation System Design using AI coding agents

Project Role
- Lead: Define system goals and guardrails for single agent and multi agent workflows.
- Policy Writer: Draft moderation policies and decision rules.
- Ingest Agent: Ingest content and metadata from sources.
- Moderation Agent: Analyze content against policies and generate decisions.
- Review Agent: Validate outputs and provide human review input when needed.
- Researcher: Validate sources and verify policy alignment.
- Domain Specialist: Align policies with legal and brand guidelines.

Supervisor or Orchestrator
- The orchestrator coordinates all agents, propagates context, triggers handoffs, and maintains a single source of truth.
- It enforces memory limits, versioning, and auditable logs for all actions.

Handoff rules between agents
- Policy writer assigns a draft decision to the Ingest agent.
- Ingest agent passes content and context to the Moderation agent for decisioning.
- Moderation agent passes results to the Review agent if escalation is required.
- Review agent confirms or rejects decisions and can trigger a delay or rollback to the policy writer.

Context, memory, and source-of-truth rules
- Context includes the current content item, policy version, and decision history.
- Memory is versioned, scoped to the moderation domain, and periodically archived.
- Source of Truth includes the latest policy docs, moderation queue, and audit logs.

Tool access and permission rules
- Moderation APIs, content database, policy store, and audit log access are restricted by role.
- Secrets are stored in a vault and rotated on schedule.

Architecture rules
- Stateless decision agents with idempotent operations.
- Immutable event sourcing for moderation events.
- Clear boundaries between data ingestion, policy interpretation, and decisioning.

File structure rules
- Keep all workflow related files under a dedicated folder with a stable structure.

Data, API, or integration rules when relevant
- Validate input payloads against the policy schema before processing.
- Use structured content metadata for accurate decisions.

Validation rules
- All outputs must be traceable to a policy rule and associated data source.
- Each decision must include a confidence score and reason.

Security rules
- Never expose raw content in logs.
- Enforce least privilege and audit all actions.

Testing rules
- Unit tests for each agent, integration tests for end-to-end flow, and regression tests for policy updates.

Deployment rules
- Deploy changes to a staging environment with a canary rollout.
- Require approval for production deployments affecting user content.

Human review and escalation rules
- Escalate to human reviewers for uncertain or high-risk cases.
- All escalations must be auditable with a reviewer comment.

Failure handling and rollback rules
- On failure, revert to the previous policy version and reprocess with a safe default.
- Notify the orchestrator and preserve incident logs.

Things Agents must not do
- Do not bypass policy or remove safeguards.
- Do not modify production data without approval.
- Do not ignore escalation rules or context drift.

Recommended Agent Operating Model

Roles and responsibilities are defined to support clear decision boundaries and robust escalation paths. The planner designs policy and orchestrates handoffs; the implementer translates policy into actions; the reviewer validates outputs; the researcher and domain specialist provide source verification and policy alignment; the orchestrator ensures end-to-end flow with proper memory and memory hygiene; escalation to humans remains a core guardrail.

Recommended Project Structure

content_moderation_system/
  agents/
    planner/
    implementer/
    reviewer/
    researcher/
    domain_specialist/
  policies/
  data/
    input/
    memory/
    sources/
  orchestrator/
  tests/
  deployment/
  docs/

Core Operating Principles

Operate with explicit policy rules and auditable decisions.
Keep memory scoped, versioned, and auditable.
Enforce tool governance and least-privilege access.
Require human review for high risk or uncertain cases.
Maintain clear escalation paths and rollback options.

Agent Handoff and Collaboration Rules

Planning agents handoff to implementers when policy is drafted; implementers handoff to reviewers for validation; researchers verify sources; domain specialists align to guidelines; orchestrator coordinates all handoffs and maintains logs.

Tool Governance and Permission Rules

All tool calls are gated by roles. Secrets must be retrieved from a secure vault. Production actions require approved change management and audit logs. External services require approvals and rate limits.

Code Construction Rules

Code must be modular, immutable, and tested. Avoid global state, ensure idempotence, and document inputs and outputs for each agent. Use explicit contracts and versioned policies. Do not bypass safety checks or skip validation.

Security and Production Rules

Protect content and user data, enforce strong access controls, monitor for anomalies, and ensure safe rollbacks and incident response. Production systems require zero-downtime deploys and auditable change logs.

Testing Checklist

Unit tests for each agent behavior
Integration tests for end-to-end moderation flow
Security and data flow tests
Canary and rollback tests

Common Mistakes to Avoid

Assuming AI outputs are always correct without human review
Overlapping responsibilities among agents
Unbounded memory growth or drift in policy alignment

FAQ

How do I start using this AGENTS.md Template for Content Moderation?

Clone the AGENTS.md Template and tailor the policy, memory, and handoff rules to your moderation domain. Use it as project level operating context for single-agent and multi-agent workflows.

What is the recommended agent roster for content moderation?

A policy writer or planner, an ingest or data agent, a moderation decision agent, a reviewer, a researcher, and a domain specialist.

How should agent handoffs be orchestrated?

Handoffs occur at decision boundaries from planner to implementer, implementer to reviewer, reviewer to domain specialist, and finally to production or human review as required.

How are memory and source-of-truth managed?

Memory stores are versioned and scoped to moderation events and policy updates. Source-of-truth includes policy docs, moderation queue, and audit logs.

How do you validate moderation outputs and handle escalations?

Outputs are validated against policy constraints with a confidence score. Escalations to human reviewers include auditable logs and a rollback path.

Target User

Use Cases