Cursor Rules Template: PyTorch CUDA Training Pipeline

Overview

This Cursor Rules Template provides a copyable PyTorch CUDA training pipeline policy for Cursor AI to follow. It clarifies the stack scope (Python, PyTorch, CUDA, GPU-accelerated training, and experiment tracking) and defines how Cursor AI should assist engineers in building reproducible, secure, and production-ready models. It includes a concrete .cursorrules block you can paste into your project root to enforce best practices across the entire training workflow.

Direct answer: Use this template to establish a standard for PyTorch CUDA optimization with Cursor AI guidance, ensuring deterministic training where required, safe memory usage, and consistent experiment metadata across CI/CD pipelines.

When to Use These Cursor Rules

Starting a new PyTorch training project that leverages CUDA for GPU acceleration.
Enforcing a repeatable, audit-friendly training pipeline with Cursor AI for governance and reproducibility.
Integrating with CI/CD to validate CUDA-enabled runs and report metrics automatically.
Ensuring secure handling of credentials, secrets, and dataset access in the training workflow.
Guiding architecture decisions and directory layout for scalable ML projects.

Copyable .cursorrules Configuration

cursor_rules {
  framework: 'pytorch'
  stack: 'gpu-accelerated training, cuda-optimization, training-pipeline'
  context: 'Cursor AI acts as an assistant for PyTorch CUDA training pipelines. Provide guidance on code structure, GPU usage, and reproducible experiments.'
  roles: [
    { name: 'Framework Engineer', context: 'Develop PyTorch CUDA training workflows' },
    { name: 'MLOps Engineer', context: 'Maintain CI/CD, experiment tracking, and reproducibility' }
  ]
  style: {
    language: 'python'
    lints: ['flake8', 'black', 'isort']
    docstrings: 'google'
  }
  arch: {
    dirs: [
      'src/training',
      'src/models',
      'src/datasets',
      'configs',
      'experiments',
      'tests'
    ]
    naming: 'snake_case'
    cuda: true
  }
  auth: {
    secrets: ['ENV', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY']
    usage: 'do not embed secrets in code; use env vars or a secret manager'
  }
  db: {
    orm: 'no-ORM for training metadata; use a lightweight store'
    patterns: ['ExperimentRun', 'ModelCheckpoint', 'DatasetVersion']
  }
  test: {
    lint: ['flake8', 'black --check']
    unit: true
    integration: true
    ci: ['GitHub Actions: CUDA and CPU matrix']
  }
  prohibited: [
    'hard-coded credentials',
    'loading entire dataset into memory',
    'disable determinism',
    'unsafe CUDA memory hacks without validation'
  ]
}

Recommended Project Structure

project-root/
  src/
    training/
      pipelines/
        cuda_opt.py
      trainers/
      models/
      datasets/
      utils/
    models/
    datasets/
  configs/
  experiments/
  tests/
  scripts/

Core Engineering Principles

Reproducibility and determinism across runs and environments.
GPU-efficiency with safe memory management and proper batch sizing.
Explicit, auditable experiment tracking and metadata hygiene.
Security-first handling of secrets and dataset access.
Modularity and clean separation of training, validation, and utilities.

Code Construction Rules

Seed all RNGs and set deterministic flags where required; document when nondeterminism is acceptable.
Use CUDA-aware device detection and fallbacks; avoid hard-coded device indices.
Prefer torch.cuda.amp.autocast and torch.cuda.amp.GradScaler for mixed precision.
Data loading: pin_memory=True, num_workers set to a sensible value, and proper sharding for distributed trains.
Save checkpoints as state_dicts; avoid serializing entire model objects.
Log only essential tensors and metrics; avoid dumping large raw datasets in logs.
Validate shapes and dtypes at every training step; guard against NaNs and infs.
Organize configuration via configs with explicit hyperparameters and defaults.
Respect in-place operations only when ownership of tensors is clear to autograd.
Document code paths with docstrings following Google style; align with project-wide linters.

Security and Production Rules

Store credentials, API keys, and dataset credentials in environment variables or secret managers; never commit them.
Log provenance metadata for reproducibility but redact sensitive inputs when needed.
Use reproducible seeds, versioned datasets, and tracked hyperparameters for auditability.
Containerize training code with explicit CUDA driver requirements; pin CUDA toolkit versions.
Enable deterministic builds for critical paths; avoid non-deterministic optimizers in production checks.

Testing Checklist

Unit tests for data preprocessing, model forward passes, and loss computations.
Integration tests validating a single training step produces expected tensor shapes.
End-to-end tests with a tiny dataset to exercise the full training loop and checkpointing.
Lint and type checks in CI; run tests on GPU-enabled runners where possible.
Performance tests to ensure CUDA kernels and autocast paths meet baseline speed.

Common Mistakes to Avoid

Skipping RNG seeding leading to non-reproducible experiments.
Overlooking proper device placement or mixing CPU/GPU tensors.
Disabling deterministic options without acknowledging the impact on results.
Storing raw large tensors in logs or artifacts; causing storage bloat.
Hard-coding secrets or credentials into code or configuration files.

FAQ

What is the purpose of this Cursor Rules Template for PyTorch CUDA?

This template codifies Cursor AI expectations for a PyTorch CUDA training pipeline, guiding you to structure code, enforce safety, and align with MLOps practices for reproducible experiments.

How do I use the copyable .cursorrules block in my project?

Copy the block from the Copyable .cursorrules Configuration section and paste it into a new file named .cursorrules at the root of your PyTorch project. Cursor AI will reference this file to enforce stack-specific rules, directory structure, and security practices during development and CI runs.

Does this template enforce CUDA-specific best practices?

Yes. It prescribes using mixed precision with autocast, GradScaler, proper memory management, deterministic options when needed, and GPU-aware data loading to optimize training throughput on CUDA-enabled hardware.

Is this template suitable for experiment tracking and reproducibility?

Absolutely. The rules include metadata patterns for experiments, model checkpoints, and dataset versions, supporting reproducible results and auditable training histories across environments and runs.

How does Cursor AI help with production readiness?

Cursor AI enforces structure, lints, and security checks that align with CI/CD for ML. It guides you to prevent leaks, ensure deterministic behavior when required, and maintain a maintainable, testable training pipeline suitable for production.

Cursor Rules Template: PyTorch CUDA Training Pipeline

Target User

Use Cases