Cursor Rules Template: ClickHouse Analytics Ingestion Pipeline

Overview

Direct answer: This Cursor Rules Template provides a copyable .cursorrules configuration for building a Python-based ClickHouse analytics ingestion pipeline using Cursor AI. It defines role, context, architecture, security, tests, and anti-patterns to ensure safe, reproducible development.

This Cursor rules configuration targets a ClickHouse analytics ingestion stack, implemented in Python, with Kafka as a streaming source in production. It emphasizes data integrity, secure handling of credentials, and testable CI/CD workflows.

When to Use These Cursor Rules

Starting a new Python-based ingestion pipeline that writes to ClickHouse for analytics.
Enforcing a consistent project layout, security posture, and test strategy across ingestion services.
Ensuring data validation, batching, and safe SQL usage in the ingestion path.
Setting up CI/CD checks for linting, type checks, and tests before deployment.

Copyable .cursorrules Configuration

framework: python
version: 3.11
frameworkRoleAndContext: "You are Cursor AI, guiding the implementation of a Python-based ingestion pipeline that writes to ClickHouse for analytics. Prioritize data integrity, security, and reliable CI/CD."
codeStyleAndGuides: >
  Use Black for formatting; isort for imports; flake8/ruff for linting. Follow a crisp PEP8-compliant style.
architectureAndDirectoryRules: >
  Project root must include src/, tests/, config/, and docker/.\n  Under src/: ingestion/, core/, utils/, models/.\n  ingestion/ must include kafka_consumer.py and clickhouse_writer.py.
authenticationAndSecurityRules: >
  Secrets come from environment variables or a secret store. Do not log credentials. Enforce TLS for all connections and enable certificate verification. Use a dedicated ClickHouse user with least privilege.
databaseAndORMPatterns: >
  Use the official ClickHouse Python driver (e.g., clickhouse-connect) with parameterized queries. Do not use an ORM in the ingestion path. Map inputs to ClickHouse data types explicitly.
testingAndLintingWorkflows: >
  Unit tests with pytest; type checks with mypy; lint with ruff. CI should run lint, tests, and type checks on push and PRs.
prohibitedActionsAndAntiPatternsForTheAI: >
  Do not embed credentials in code or configs. Do not perform ad-hoc schema migrations at runtime. Do not bypass TLS or expose credentials to logs. Do not hardcode hostnames.

Recommended Project Structure

.
├── src/
│   ├── ingestion/
│   │   ├── kafka_consumer.py
│   │   └── clickhouse_writer.py
│   ├── config/
│   │   └── settings.py
│   └── utils/
├── tests/
├── docker/
├── Pipfile
├── requirements.txt
├── docker-compose.yml

Core Engineering Principles

Single responsibility and clear module boundaries.
Idempotent ingestion with exactly-once semantics where possible.
Explicit error handling and robust retry/backoff policies.
Type-safe code with static analysis (mypy).
Secure by default: credentials externalized, TLS, least privilege.
Observability: structured logs, metrics, tracing for ingestion pipeline.

Code Construction Rules

Use parameterized queries for ClickHouse writes. Do not build SQL via string concatenation.
Batch records before writing to ClickHouse; define BATCH_SIZE with a sane default (e.g., 1000).
Separate config from code; read from environment variables with defaults overridden by config files.
Validate input schema at the boundary; enforce non-null constraints and typed data mapping.
Only use ClickHouse-native driver; no ORM usage in ingestion path.
Implement retry with exponential backoff; cap max retries to avoid infinite loops.
Document changes in migrations; avoid on-the-fly schema changes in production code.
Do not log sensitive data (PII) in info or debug logs.

Security and Production Rules

TLS for all connections; verify certificates; disable insecure ciphers.
Secrets management via environment or secret store; never in code.
Least privilege on ClickHouse user; restrict to required databases/tables.
Containerization: run as non-root, drop privileges, and use read-only volumes where possible.
CI/CD: run tests, lint, and security checks before deployment; pin dependency versions.

Testing Checklist

Unit tests for individual components (kafka_consumer, writer, config).
Integration tests for end-to-end ingestion using a test ClickHouse instance.
Contract tests to verify data schema after transformation.
Static type checks with mypy; lint with ruff/flake8; format with Black.
CI pipeline builds container images and runs tests on push.

Common Mistakes to Avoid

Hardcoding credentials or hostnames; always use environment variables.
Writing to ClickHouse without batching; poor throughput and backpressure.
Ignoring data schema validation; leads to downstream errors.
Skipping TLS or secret rotation; weak security posture in production.

FAQ

Can I use this template with a different database?

The template is designed for ClickHouse ingestion; adapt the driver, connection details, and SQL to the target database while preserving secure handling and testing strategies. Ensure you update the config, SQL statements, and dependencies accordingly.

What stack does this Cursor Rules Template target?

It targets a Python-based ingestion pipeline emitting data into ClickHouse, orchestrated via Kafka (or similar streaming source). It enforces data validation, security, and testable CI/CD practices tailored to this stack.

How are credentials managed in production?

Credentials are loaded from environment variables or a secret store, never logged. Connections to ClickHouse and message brokers use TLS. Keys rotate periodically via your secret management system and CI/CD variables.

What should I test before deploying?

Run unit tests for config and ingestion components, integration tests against a test ClickHouse instance, and end-to-end tests that validate the full data flow. Ensure lints pass and type checks succeed in CI before deployment.

How can I extend this template for additional sources?

Introduce adapters for new sources in src/ingestion, reuse the same ClickHouse writer, and add schema validations. Update the .cursorrules to include new source-specific tests, configs, and security considerations.

Cursor Rules Template: ClickHouse Analytics Ingestion Pipeline

Target User

Use Cases