Cursor Rules Template: Financial Document Analyzer (Python)

Overview

Direct answer: this Cursor Rules Template provides a complete Python-based framework to extract key metrics from financial documents including PDFs, spreadsheets, and bank statements using Cursor AI. It defines a deterministic workflow, validation rules, and audit logging so you can paste the included .cursorrules block into your project root and run consistent extractions.

The template covers the stack: Python 3.11, pdf parsing with pdfminer.six, Excel extraction with openpyxl, optional OCR with pytesseract for scanned pages, and data processing with pandas. It enforces data quality, avoids exposing PII, and writes outputs to Parquet/CSV for reliable downstream usage.

When to Use These Cursor Rules

When you need consistent extraction of financial metrics from PDFs, spreadsheets, and bank statements across diverse formats.
When you require audit-friendly outputs with a clear data lineage and validation checks.
When you want a reproducible project structure that Cursor AI can guide for maintenance and sharing.

Copyable .cursorrules Configuration

// Cursor Rules Template for Python Financial Document Analyzer
Framework Role & Context:
- You are a Cursor AI assistant tasked with generating a Python-based financial document analyzer.
- The tool must extract metrics from PDFs, Excel files, and bank statements, including revenue, net income, margins, cash flows, and balances.
- Ensure deterministic outputs, data validation, and audit logging. Do not leak PII in outputs.

Code Style and Style Guides:
- Language: Python 3.11
- Formatting: Black; type hints where appropriate; lint with Flake8
- Naming: snake_case for modules, PascalCase for classes
- Use pandas for data shapes and Parquet/CSV for artifacts

Architecture & Directory Rules:
- Project layout
  src/extractors/pdf_extractor.py
  src/extractors/xlsx_extractor.py
  src/extractors/ocr_extractor.py
  src/processors/metrics.py
- Data inputs: data/input/
- Outputs: data/output/
- Tests: tests/
- Cursorrules location: cursorrules/financial-docs-cursor-rules-template.cursorrules

Authentication & Security Rules:
- Read secrets from environment variables; do not embed secrets
- Do not log PII or raw documents
- Ensure artifacts are stored securely

Database and ORM patterns:
- Do not rely on an ORM; store outputs in Parquet/CSV via pandas
- Optional audit logs to a PostgreSQL database via SQLAlchemy if needed in tests

Testing & Linting Workflows:
- Pytest for unit tests; pytest-bdd for integration tests optional
- Static checks via mypy and flake8
- CI should run tests on PRs and on push to main

Prohibited Actions and Anti-patterns for the AI:
- Do not assume fixed document templates
- Do not skip data validation
- Do not produce outputs without explicit schema mapping
- Do not expose secrets in logs

Recommended Project Structure

project-root/
├── data/
│   ├── input/
│   └── output/
├── src/
│   ├── extractors/
│   │   ├── pdf_extractor.py
│   │   ├── xlsx_extractor.py
│   │   └── ocr_extractor.py
│   └── processors/
│       └── metrics.py
├── tests/
├── cursorrules/
│   └── financial-docs-cursor-rules-template.cursorrules

Core Engineering Principles

Explicit data contracts: define input schemas and output metrics clearly.
Defensive parsing: validate types, handle missing fields gracefully, and log anomalies.
Deterministic processing: deterministic RNG seeding, stable data ordering, and reproducible results.
Security by default: hide PII, encrypt artifacts at rest, and use env vars for secrets.
Test-driven workflow: unit tests accompany each extractor; integration tests cover end-to-end.

Code Construction Rules

Keep extractors small and single-responsibility; compose metrics in a dedicated processor.
Value normalization: cast numbers to decimal, normalize currency units, and validate date formats.
Use Parquet for outputs to support schema evolution and fast reads.
Avoid hard-coded paths; rely on environment/config to locate data roots.
Document every metric with a defined data type and precision.

Security and Production Rules

PII must never be logged; redact during logs and reports.
Secrets reside in environment variables; never embed in code or configs.
Artifacts stored locally or in a secured bucket with strict access control.
Validate and quarantine corrupted documents to prevent injection or parsing errors.

Testing Checklist

Unit tests for pdf_extractor, xlsx_extractor, ocr_extractor to verify field extractions.
Integration tests for end-to-end metric calculation from sample inputs.
Linting and type checks with mypy/flake8.
Performance checks on large PDFs and spreadsheets to ensure scalability.
Validation tests to ensure no PII leakage in outputs or logs.

Common Mistakes to Avoid

Relying on fixed document layouts; implement fallback extraction strategies.
Ignoring data type fidelity, especially numbers, currencies, and dates.
Skipping tests or logging sensitive data in artifact metadata.
Overfitting to sample inputs; ensure generalization across document variants.

Related Cursor rules templates

Explore adjacent Cursor rules templates for similar stacks, workflows, and production constraints.

FAQ

What is this Cursor Rules Template for?

This template provides a complete Python-based Cursor AI workflow to extract key financial metrics from PDFs, Excel files, and bank statements, with a copyable .cursorrules block to guide implementation and testing.

Which stack does this template target?

The template targets Python 3.11 with pdfminer.six, openpyxl, pandas, and optional OCR (pytesseract) for scanned docs, designed to run as a Cursor AI workflow.

How do I run this locally?

Install dependencies, place sample documents under data/input, and run the Cursor AI editor with the included cursorrules file to extract metrics and generate Parquet/CSV outputs.

What metrics are extracted?

Key metrics include total_revenue, net_income, gross_margin, operating_cash_flow, account_balances, transaction_count, average_transaction_value, and monthly trends.

How is data secured?

Secrets come from environment variables; do not log PII; use local artifact storage or secure bucket with access controls; ensure data retention policies.

Can I customize metrics and sources?

Yes. The template supports extending extractors for PDFs, Excel files, and OCR sources; you can add new metrics mapping and validation rules with minimal changes.

Target User

Use Cases