CLAUDE.md Template for Production LlamaIndex & Advanced RAG

What is this CLAUDE.md template for?

This CLAUDE.md template configures your AI coding assistant to operate as a high-caliber data and search architect focused exclusively on LlamaIndex best practices. LlamaIndex undergoes frequent architectural enhancements; this template prevents the AI from mixing legacy ServiceContext primitives into newer workflows, demanding the use of modern Settings objects, explicit Node Parsers, and unified ingestion pipelines instead.

It provides clear, unyielding guidance on handling production vector lookups, multi-tenant index metadata routing, complex hierarchical data parsing, and cost-efficient structured prediction layers.

When to use this template

Use this template when building enterprise search engines, parsing complex PDFs/CAD data patterns, connecting deep multi-source data ingestion routines, configuring managed Vector Stores (like Qdrant, Pinecone, or pgvector), or constructing hybrid Property Graph pipelines with advanced reranking layers.

Recommended project structure

project-root/
  app/
    engine/
      index.py
      ingest.py
      node_processors.py
    loaders/
      custom_readers.py
    pipelines/
      query_pipeline.py
    core/
      settings.py
    main.py
  data/
  tests/
  CLAUDE.md
  requirements.txt

CLAUDE.md Template

# CLAUDE.md: Production LlamaIndex & Data Ingestion Engineering Guide

You are operating as an Expert AI Research Engineer specializing in advanced RAG, structural data ingestion pipelines, and semantic retrieval systems built on modern LlamaIndex.

Your core goal is to build highly performant, precise, and deterministic search and information extraction infrastructures.

## Core Implementation Principles

- **Modern Configuration Standards**: Always use the global `Settings` object (`llamaindex.core.Settings`) or explicit local configurations. Never generate legacy, deprecated `ServiceContext` configurations.
- **Decoupled Ingestion Layers**: Isolate data loading and structural node generation completely from retrieval and query orchestrations. Use explicit `IngestionPipeline` constructs.
- **Explicit Document & Node Parsing**: Avoid using generic auto-chunking. Always select specialized text splitters, node parsers (e.g., `SentenceSplitter`, `HierarchicalNodeParser`), and explicitly define metadata extractors.
- **Asynchronous Operations**: Maximize parallel execution loops using async methods (`aquery`, `aretrieve`, `aintext_documents`) for handling multi-document indexing or high-concurrency requests.

## Code Construction Rules

### 1. Ingestion & Transformation Foundations
- Define data structures cleanly. When parsing documents via custom loaders, map explicit, immutable values directly into the `metadata` dictionary of the `Document` object.
- Always preserve critical origin tracing dimensions, such as page counts, access permissions, source paths, and last modified timestamps inside node metadata arrays.
- Enforce the use of a persistent metadata storage cache (`IngestionCache`) to prevent costly redundant re-embedding steps during continuous document synchronization loops.

### 2. Index Management & Storage Layers
- When integrating with vector engines (e.g., Qdrant, Milvus, Postgres/pgvector), instantiate explicit `StorageContext` components with clear collections or namespace designations.
- Ensure strict multi-tenant boundary checking by attaching explicit metadata filters (`MetadataFilters` and `ExactMatchFilter`) at the structural engine query level, rather than relying on application-level filtering later.

### 3. Query Pipelines & Reranking Architecture
- For non-trivial retrieval pipelines, construct explicit execution flows utilizing LlamaIndex `QueryPipeline` components.
- Always incorporate a definitive reranking layer (e.g., `CohereRerank`, `LLMRerank`, or cross-encoders) to minimize the context window size required by the synthesis LLM and optimize retrieval accuracy.
- Define structured outputs clearly by passing a strongly typed Pydantic class to the `response_cls` parameter of your query engine or program layer.

## Error Handling, Optimization, & Performance Boundaries
- Explicitly check token length parameters for prompt building helpers to prevent system exceptions when dealing with wide context sizes.
- Catch underlying connectivity errors from embedding APIs and third-party storage indices cleanly; never let vector failures bubble up into untamed system crashes.
- Maintain absolute visibility into vector latency metrics by embedding explicit logger traces around ingestion transformations and generation cycles.

Why this template matters

LlamaIndex applications frequently fail in production due to unstructured chunking, poor indexing boundaries, or mixing outdated framework components. AI models often generate code using deprecated abstractions, leading to instant compilation failures or poor retrieval relevance.

This template locks your workspace down to modern IngestionPipeline configurations, ensures clean structural outputs, dictates explicit metadata isolation, and structures retrieval via advanced rerankers for dependable enterprise execution paths.

Recommended additions

Integrate a specific programmatic blueprint for building hierarchical parent-child node retrieval pipelines.
Add pre-configured property graph extraction schemas for specialized entity-relationship modeling.
Define automated regression testing tasks utilizing mock embedding engines to verify node parser changes without API cost overhead.
Include specialized operational blueprints for structuring context data arrays when handling multi-lingual data corpuses.

FAQ

How does this template handle historical LlamaIndex deprecations?

It explicitly restricts the AI from utilizing legacy constructs like ServiceContext, forcing it to apply modern, up-to-date global Settings and local pipeline configurations instead.

Can this template be used to build Property Graphs or Knowledge Graphs?

Yes. The rules require explicit schema configuration and modular node processing blocks, making it highly effective for organizing structured entity-relationship knowledge bases.

How does this template optimize API usage and embedding costs?

By dictating explicit IngestionPipeline caching setups and requiring advanced post-retrieval reranking blocks, it limits costly re-embeddings and narrows down final prompt token volumes dramatically.

Is it suitable for multi-tenant enterprise data search architectures?

Yes, the code construction rules mandate strict multi-tenant boundary checking through explicit MetadataFilters applied directly to vector queries, entirely preventing cross-tenant information leaks.

About the author

Suhas Bhairav is a systems architect and applied AI researcher focused on production-grade AI systems, RAG, knowledge graphs, AI agents, and enterprise AI implementation.

Target User

Use Cases