100 Best ChatGPT Prompts for Python Data Analysis
A practical prompt library of 100 ChatGPT prompts for Python Data Analysis to accelerate data work using Python libraries like pandas, NumPy, and visualization tools.
Best For
Data analysts, data scientists, Python practitioners
Prompt Use Cases
- Data cleaning
- Exploratory data analysis
- Feature engineering
- Time series analysis
- Visualization automation
- Reproducible research
- Pandas workflow optimization
Introduction
This page is a practical ChatGPT prompt library for Python Data Analysis. It targets data analysts, data scientists, and Python practitioners who want ready-to-use prompts to accelerate data work using pandas, NumPy, and visualization tools.
Each prompt below is copyable and self-contained, with role, task, context placeholders, output format, and constraints to minimize setup time.
Direct Answer
The best ChatGPT prompts for Python Data Analysis are a carefully curated set of role-based tasks that cover data loading, cleaning, exploration, modeling prep, and visualization. Use them as starting points, customize placeholders, and enforce output formats to ensure consistent, actionable results.
How to Use These ChatGPT Prompts
- Replace placeholders like [dataset_path], [columns], [group_col], [target_column] with your actual data and column names.
- Add constraints such as memory limits, time bounds, and required libraries (pandas, NumPy, SciPy, seaborn, plotly).
- Request outputs in a specific format (JSON, Python code blocks, DataFrame previews, or charts) and specify how results should be delivered.
- Verify outputs by running the generated code on a sample dataset, checking shapes, types, and basic sanity checks.
100 Best ChatGPT Prompts for Python Data Analysis
- Load and Inspect Dataset Summary: Role: You are a Python data analyst. Task: Load the dataset from [dataset_path] and generate a concise summary of its structure and basic statistics. Context: Dataset contains columns [columns]. Output: A JSON object with keys: total_rows, column_types, missing_values_per_column, sample_head (first 5 rows). Constraints: Use pandas; avoid altering the dataset; handle non-numeric values gracefully; return results in the defined JSON structure. Output: JSON.
- Compute Basic Descriptive Statistics: Role: You are a Python data analyst. Task: Compute descriptive statistics for numeric columns in [dataset_path]. Context: Use pandas to calculate mean, median, std, min, max, and IQR per numeric column. Output: A compact JSON table with columns: column, mean, median, std, min, max, IQR. Constraints: Exclude non-numeric columns; handle NaNs by ignoring them in calculations.
- Identify Missing Values by Column: Role: You are a Python data analyst. Task: Identify and report missing values by column for [dataset_path]. Context: Provide counts and missing value percentages. Output: CSV snippet or JSON with columns: column, missing_count, missing_percent. Constraints: Include only columns with any missing values.
- Handle Outliers with IQR Rule: Role: You are a Python data analyst. Task: Detect and cap outliers in numeric columns of [dataset_path] using the IQR rule (1.5 * IQR). Context: Provide a transformation plan and a sample before/after snapshot. Output: Python code snippet that applies winsorization and a JSON summary of outliers removed per column.
- Normalize or Scale Features for Analysis: Role: You are a Python data analyst. Task: Normalize numeric features in [dataset_path] using Min-Max scaling or StandardScaler. Context: Choose scaling per feature type with justification. Output: A Python function snippet and a JSON summary of scaled feature ranges or means/stds. Constraints: Output should not modify non-numeric columns.
- Correlation Matrix and Heatmap Guide: Role: You are a Python data analyst. Task: Compute a correlation matrix for numeric features in [dataset_path] and prepare a heatmap-ready data table. Context: Include strong and weak correlations, highlight potential multicollinearity. Output: Python code to generate a heatmap (Seaborn/Matplotlib) and a JSON snippet of top 5 correlated pairs.
- GroupBy Aggregation Report: Role: You are a Python data analyst. Task: Produce aggregations by group for [dataset_path] based on column [group_col] and metrics [metrics]. Context: Include count, mean, median, max, min per group. Output: A Pandas DataFrame and a JSON summary of results. Constraints: Support multiple groupings if provided.
- Time Series Resampling and Rolling Stats: Role: You are a Python data analyst. Task: Resample a time-series in [dataset_path] to [resample_freq], compute rolling statistics (window=[window_size]). Context: Ensure datetime parsing and timezone handling. Output: A Python snippet to resample and a table of rolling metrics. Constraints: If missing timestamps, report gaps.
- Detect Seasonal Trends in Time Series: Role: You are a Python data analyst. Task: Analyze seasonality in [timeseries_column] of [dataset_path] using decomposition (classical or STL). Context: Provide seasonal, trend, residual components and a plot option. Output: Python code and a JSON summary of seasonal strength by period.
- Visualize Distribution with Histograms and KDE: Role: You are a Python data analyst. Task: Create distribution visualizations for numeric features in [dataset_path] using histograms and KDE plots. Context: Include x-axis labels, titles, and A/B comparison if [compare_feature] is provided. Output: Python plotting code and a JSON with feature names and distribution metrics (mean, median, skew).
- Categorical Encoding Strategy Evaluation: Role: You are a Python data analyst. Task: Compare encoding strategies for categorical features in [dataset_path] (one-hot, target encoding, ordinal). Context: Provide recommendations based on prediction task and data size. Output: A decision table in JSON and a short justification.
- Feature Engineering Plan for Predictive Modeling: Role: You are a Python data analyst. Task: Propose feature engineering steps for [dataset_path] to improve a model predicting [target_column]. Context: Include interactions, aggregations, and handling of missing values. Output: A Python function outline and a feature dictionary with rationale.
- Hypothesis Testing Setup and Execution: Role: You are a Python data analyst. Task: Formulate and run a hypothesis test (e.g., t-test or Mann-Whitney) on [dataset_path] comparing groups [group_col] by [target_col]. Output: Test statistic, p-value, assumptions check, and an interpretation statement.
- P-Value Interpretation for Data Analysts: Role: You are a Python data analyst. Task: Explain p-values and confidence intervals for results from [dataset_path] in plain language. Context: Provide examples with [sample_size], [effect_size], and [significance_level]. Output: A concise interpretation and a one-page explainer code snippet.
- Bootstrapping for Confidence Intervals: Role: You are a Python data analyst. Task: Implement bootstrap resampling on [dataset_path] to estimate confidence intervals for [metrics]. Context: Use [bootstrap_samples] samples and report percentile CIs. Output: Python function and a JSON CI summary.
- Create Reproducible Data Analysis Notebook Template: Role: You are a Python data analyst. Task: Generate a reusable Jupyter notebook template for [topic] analyses on [dataset_path]. Context: Include sections for data loading, cleaning, EDA, modeling, and validation. Output: Notebook skeleton in Python code cells and Markdown cells.
- SQL Query to Pandas DataFrame Bridge: Role: You are a Python data analyst. Task: Write an SQL query plan to extract data from [database] and read into a Pandas DataFrame with [tables]. Context: Include joins, where conditions, and groupings. Output: SQL snippet and Pandas read_sql code block.
- Data Cleaning Pipeline with Pandas: Role: You are a Python data analyst. Task: Build a clean-and-transform pipeline for [dataset_path], handling type coercion, missing values, and outliers. Output: A single Python function that returns a cleaned DataFrame and a JSON summary of cleaning steps.
- Data QA Checks and Validation Rules: Role: You are a Python data analyst. Task: Define data quality checks for [dataset_path], including schema conformance, range checks, and uniqueness tests. Output: A Python test suite (pytest) and a JSON checklist.
- Anomaly Detection in Data: Role: You are a Python data analyst. Task: Implement a lightweight anomaly detection on [dataset_path] using z-scores or isolation forest. Context: Flag anomalies per column and provide remediation suggestions. Output: Python code and a JSON report of anomalies.
- Dimensionality Reduction Overview with PCA: Role: You are a Python data analyst. Task: Apply PCA to [dataset_path] for features [features] to reduce to [n_components] components. Output: Scree plot code, explained variance JSON, and a reduced DataFrame.
- Feature Correlation vs Causation Considerations: Role: You are a Python data analyst. Task: Analyze correlation results in [dataset_path] and provide cautions about misinterpreting causation. Output: A short explainer and a JSON summary of correlated feature pairs with notes.
- Missing Value Imputation Strategy: Role: You are a Python data analyst. Task: Propose and implement an imputation strategy for [dataset_path] addressing both numerical and categorical columns. Output: Imputed DataFrame and a JSON justification.
- Data Type Casting and Precision Control: Role: You are a Python data analyst. Task: Normalize data types in [dataset_path], converting columns to appropriate dtypes and controlling precision. Output: A Python function and a summary of dtype changes.
- Timezone-Aware Timestamp Handling: Role: You are a Python data analyst. Task: Normalize and convert timestamp columns in [dataset_path] to a consistent timezone. Output: Python snippet and a JSON summary of converted columns.
- Data Visualization Storyboard for Stakeholders: Role: You are a Python data analyst. Task: Create a storyboard of visualizations for [dataset_path] to tell a data-driven story about [business_goal]. Output: A set of chart specs and a quick narrative.
- Champion a Lightweight Data Profiling Report: Role: You are a Python data analyst. Task: Generate a compact profiling report for [dataset_path], including top features by variance, missingness, and cardinality. Output: A one-page JSON report and a summary table.
- Aggregate Functions Optimization with Pandas: Role: You are a Python data analyst. Task: Optimize a set of aggregate operations in [dataset_path] by selecting appropriate groupby aggregates and avoiding chained assignments. Output: Refactored code and a benchmark summary.
- Efficiency Tips for Large Datasets: Role: You are a Python data analyst. Task: Propose efficiency improvements for working with large datasets in [dataset_path], including chunking, lazy loading, and vectorized ops. Output: A plan with code snippets.
- Memory-Efficient Data Frames with Dtypes: Role: You are a Python data analyst. Task: Optimize memory usage of DataFrames in [dataset_path] by downcasting numeric dtypes. Output: A memory impact report and a Python function.
- Data Alignment Across Multiple Sources: Role: You are a Python data analyst. Task: Align and merge data from sources A and B (paths [sourceA], [sourceB]) on key [join_key]. Output: A merged DataFrame and a JSON reconciliation summary.
- Merge and Join Strategy Evaluation: Role: You are a Python data analyst. Task: Evaluate merge strategies (inner/outer/left/right) for [dataset_path] with [reference]. Output: A comparison table and Python code to perform the chosen merge.
- Map and Apply vs Vectorize Performance: Role: You are a Python data analyst. Task: Benchmark map/apply vs vectorized operations on [dataset_path] for [operation]. Output: A performance chart and a short conclusion with recommended approach.
- Data Pipeline Debugging Checklist: Role: You are a Python data analyst. Task: Create a debugging checklist for a data pipeline on [dataset_path], including common failure modes and fixes. Output: A checklist table in JSON and Markdown.
- Statistical Summary by Group: Role: You are a Python data analyst. Task: Generate a statistical summary by group for [dataset_path] using [group_col] and [metrics]. Output: A grouped JSON summary and DataFrame snippet.
- Outlier Robustness Checks: Role: You are a Python data analyst. Task: Evaluate robustness of your analysis to outliers in [dataset_path] by comparing results with and without outlier removal. Output: A report with effect sizes and a Python snippet.
- Compute Feature Importances from a Model: Role: You are a Python data analyst. Task: Given a trained model on [dataset_path], compute SHAP values or feature importances for [features]. Output: A JSON summary and a small visualization script.
- Baseline Model Evaluation with Train/Test Split: Role: You are a Python data analyst. Task: Create a baseline model evaluation pipeline on [dataset_path], using a train/test split and a baseline algorithm. Output: Performance metrics in JSON and code.
- Cross-Validation Setup in Python: Role: You are a Python data analyst. Task: Set up K-Fold cross-validation on [dataset_path] for the target [target_column]. Output: A cross-validation plan, code snippet, and expected metrics JSON.
- Hyperparameter Tuning Plan for a Dataset: Role: You are a Python data analyst. Task: Design a hyperparameter search plan for a model trained on [dataset_path], including parameter grid and scoring metric. Output: A tuning script and results dictionary.
- Data Visualization with Seaborn Themes: Role: You are a Python data analyst. Task: Create a set of Seaborn visuals for [dataset_path] applying theme [theme_name] to improve readability. Output: Plotting code and a JSON summary of visuals.
- Plotly Interactive Dashboard Script: Role: You are a Python data analyst. Task: Build an interactive Plotly dashboard for [dataset_path] showcasing key metrics [metrics]. Output: A Python script and a JSON outline of widgets and callbacks.
- Time Series Forecasting Preparation: Role: You are a Python data analyst. Task: Prepare data and features for time series forecasting on [dataset_path], including train/validation split and feature engineering (lags, rolling stats). Output: A documentation snippet and code.
- Seasonality Decomposition of a Series: Role: You are a Python data analyst. Task: Decompose [series_name] in [dataset_path] into trend, seasonal, and residual components. Output: Decomposition plots and a JSON summary.
- ARIMA vs Prophet Quick Comparison: Role: You are a Python data analyst. Task: Compare ARIMA and Prophet approaches for [dataset_path] on [series_name]. Output: Side-by-side metrics, plotting code, and a recommendation.
- Granger Causality and Time Series Causality: Role: You are a Python data analyst. Task: Assess Granger causality between [seriesA] and [seriesB] in [dataset_path]. Output: Test results and interpretation.
- Smoothing Techniques for Noise Reduction: Role: You are a Python data analyst. Task: Apply smoothing (Moving Average, Savitzky-Golay) to [series_name] in [dataset_path]. Output: Smoothed series and a comparison chart.
- Rolling Window Correlation Analysis: Role: You are a Python data analyst. Task: Compute rolling correlation between [featureA] and [featureB] in [dataset_path] with window size [window]. Output: A DataFrame and a JSON summary.
- Data Sampling Techniques for Big Data: Role: You are a Python data analyst. Task: Propose sampling strategies for large datasets ([dataset_path]) to enable quick exploratory analysis. Output: A sampling plan and Python code.
- Handling Imbalanced Classes in Data: Role: You are a Python data analyst. Task: Address class imbalance in [dataset_path] using resampling (SMOTE, undersampling) or algorithm adjustments. Output: Implementation code and evaluation metrics.
- Text Data Cleaning for Analysis: Role: You are a Python data analyst. Task: Preprocess text data in [dataset_path], including normalization, tokenization, and stopword removal. Output: Cleaned text column and a summary of changes.
- NLP Feature Extraction with TF-IDF: Role: You are a Python data analyst. Task: Extract TF-IDF features from text data in [dataset_path] for [text_column]. Output: Feature matrix shape, vocabulary, and a sample vector.
- Clustering with K-Means and Evaluation: Role: You are a Python data analyst. Task: Apply K-Means to [dataset_path] on features [features] and evaluate using silhouette score. Output: Cluster assignments, centroids, and evaluation JSON.
- DBSCAN Density-Based Clustering Guide: Role: You are a Python data analyst. Task: Perform DBSCAN clustering on [dataset_path] with parameters [eps], [min_samples]. Output: Cluster labels and a JSON summary of results.
- Dimensionality Reduction with t-SNE: Role: You are a Python data analyst. Task: Apply t-SNE to reduce dimensions of [dataset_path] with [perplexity] and [n_components]. Output: 2D coordinates and a plot. Constraints: Report interpretation cautions.
- Data Quality Scorecard Generation: Role: You are a Python data analyst. Task: Generate a data quality scorecard for [dataset_path] including completeness, consistency, accuracy, and timeliness. Output: JSON scorecard and a markdown summary.
- Unit Tests for Data Analysis Functions: Role: You are a Python data analyst. Task: Write unit tests for common data analysis functions used on [dataset_path]. Output: Pytest tests and a test report.
- Documenting Data Transformations: Role: You are a Python data analyst. Task: Generate documentation for data transformations applied to [dataset_path], including inputs, outputs, and rationale. Output: Markdown doc and JSON summary.
- Reproducibility with Random Seeds: Role: You are a Python data analyst. Task: Ensure reproducibility across analyses on [dataset_path] by setting seeds and documenting environments. Output: A reproducibility plan and code snippet.
- Data Versioning Strategy: Role: You are a Python data analyst. Task: Propose a data versioning strategy for [dataset_path], including storage, metadata, and lineage. Output: Plan with example commands and a JSON audit log.
- Notebook Organization and Naming Conventions: Role: You are a Python data analyst. Task: Define best practices for organizing notebooks for analyses on [dataset_path]. Output: A naming convention guide and a sample notebook outline.
- Virtual Environment and Dependency Management: Role: You are a Python data analyst. Task: Create a clean environment plan for reproducible analyses on [dataset_path], including dependencies and version pinning. Output: A requirements.txt and conda environment YAML.
- Data Analysis Project Kickoff Plan: Role: You are a Python data analyst. Task: Outline a kickoff plan for a new analysis on [dataset_path], including goals, stakeholders, milestones, and risk. Output: A project plan in JSON and a brief executive summary.
- Benchmarking Data Processing Pipelines: Role: You are a Python data analyst. Task: Benchmark data loading, cleaning, and analysis steps on [dataset_path] and compare performance across approaches. Output: A benchmark table and Python scripts.
- Profiling Python Code with cProfile: Role: You are a Python data analyst. Task: Profile a data analysis script on [script_path] to identify bottlenecks in [dataset_path]. Output: cProfile results and a short optimization plan.
- Memory Profiling Utilities for Data Analysis: Role: You are a Python data analyst. Task: Profile memory usage for a data analysis workflow on [dataset_path] and propose reductions. Output: Memory stats and optimization tips.
- Parallelization with Multiprocessing and Dask: Role: You are a Python data analyst. Task: Parallelize a data processing task on [dataset_path] using multiprocessing or Dask. Output: Parallel code, performance comparison, and a JSON summary.
- Vectorized Operations vs Loops in Pandas: Role: You are a Python data analyst. Task: Compare vectorized operations vs loop-based approaches for a given [operation] on [dataset_path]. Output: Benchmark results and recommended approach.
- Grouped Custom Aggregations with agg: Role: You are a Python data analyst. Task: Implement custom aggregations using .agg on [dataset_path] grouped by [group_col]. Output: Code snippet and a JSON summary of results.
- Pivot Tables for Multi-Dimensional Analysis: Role: You are a Python data analyst. Task: Create pivot tables from [dataset_path] to analyze [dimensions] with measures [metrics]. Output: Pivot code, and a JSON summary of results.
- Data Pivot Table vs SQL Pivot: Role: You are a Python data analyst. Task: Compare Python pivot table techniques with SQL pivots for [dataset_path]. Output: Pros/cons table and example queries/transformations.
- Data Storytelling and Executive Summary: Role: You are a Python data analyst. Task: Produce an executive summary and storytelling narrative for [dataset_path] focusing on [business_goal]. Output: A 1-page narrative and a visual summary.
- KPI Extraction from Data: Role: You are a Python data analyst. Task: Identify and compute key performance indicators for [dataset_path] aligned with [business_goals]. Output: KPI table and a JSON rationale.
- Automated Report Generation in Python: Role: You are a Python data analyst. Task: Generate an automated data report for [dataset_path], including charts, tables, and insights. Output: A Jupyter export or PDF-ready report content.
- Excel Data Import and Cleanup Automation: Role: You are a Python data analyst. Task: Import data from [excel_path], clean and transform it for analysis on [dataset_path]. Output: Cleaned DataFrame and an import script.
- CSV Validation and Schema Enforcement: Role: You are a Python data analyst. Task: Validate a CSV against a schema with columns [columns] and types [types] for [dataset_path]. Output: Validation report and a corrected CSV suggestion.
- Data Serialization Formats and Trade-offs: Role: You are a Python data analyst. Task: Compare data serialization formats (CSV, Parquet, Feather) for a workflow on [dataset_path]. Output: A decision guide and sample I/O code.
- Data Anonymization for Privacy: Role: You are a Python data analyst. Task: Apply anonymization techniques to sensitive columns in [dataset_path] while preserving analytics usefulness. Output: Anonymized DataFrame and a privacy impact JSON.
- Data Governance and Access Controls: Role: You are a Python data analyst. Task: Propose governance controls for datasets used in [project], including access rules, versioning, and auditing. Output: Governance plan and a policy outline.
- Feature Scaling Impact on Models: Role: You are a Python data analyst. Task: Analyze how scaling affects model performance on [dataset_path] and [target_column]. Output: Comparative metrics and plotting code.
- Auto-Documentation of Data Analysis Steps: Role: You are a Python data analyst. Task: Generate auto-documentation for a data analysis workflow on [dataset_path], including inputs, outputs, and decisions. Output: Documentation in Markdown and JSON metadata.
- Exploratory Data Analysis Checklist: Role: You are a Python data analyst. Task: Create an EDA checklist for [dataset_path], covering data quality, distributions, and assumptions. Output: Checklist and a starter notebook template.
- Handling Nulls in Numerical Features: Role: You are a Python data analyst. Task: Decide and implement null handling for numerical features in [dataset_path] with justification. Output: Code snippet and a JSON summary of effects on statistics.
- Handling Nulls in Categorical Features: Role: You are a Python data analyst. Task: Decide and implement null handling for categorical features in [dataset_path] with justification. Output: Code snippet and a JSON summary.
- Time Series Cross-Validation: Role: You are a Python data analyst. Task: Set up time-series cross-validation for [dataset_path] on [series_column] with [cv]. Output: A cross-validation plan and code snippet.
- Forecast Accuracy Metrics: Role: You are a Python data analyst. Task: Compute forecast accuracy metrics (MAPE, RMSE, MAE) for [dataset_path] forecasts vs actuals. Output: Metrics JSON and a plotting snippet.
- Visual Regression Testing for Dashboards: Role: You are a Python data analyst. Task: Implement visual regression checks for dashboards built from [dataset_path]. Output: Test plan, code, and summary report.
- Data Analysis with Jupyter Widgets: Role: You are a Python data analyst. Task: Build interactive widgets (sliders, selectors) to explore [dataset_path] dynamically. Output: Widget-enabled notebook snippet and a brief usage guide.
- Reusable Data Analysis Functions Library: Role: You are a Python data analyst. Task: Create a small library of reusable data analysis functions for common tasks on [dataset_path]. Output: Python module with tests and a usage example.
- Data Analysis Script Refactoring: Role: You are a Python data analyst. Task: Refactor an analysis script on [dataset_path] to improve readability and performance. Output: Refactored script and a before/after benchmark.
- Data Analysis with Spark via PySpark: Role: You are a Python data analyst. Task: Analyze a dataset in Spark via PySpark for [dataset_path], including simple aggregations. Output: PySpark script and a results JSON.
- GPU-Accelerated Data Analysis with CuPy: Role: You are a Python data analyst. Task: Leverage CuPy for a compute-heavy operation on [dataset_path]. Output: CuPy-based code and performance summary.
- Data Analysis with NumPy Beyond Basics: Role: You are a Python data analyst. Task: Apply advanced NumPy operations to [dataset_path], including broadcasting and vectorization. Output: Code snippet and a JSON result summary.
- Naive Bayes/Classification Prompts: Role: You are a Python data analyst. Task: Build a simple classifier on [dataset_path] using Naive Bayes and evaluate. Output: Model code, confusion matrix, and accuracy.
- Regression Analysis Prompts: Role: You are a Python data analyst. Task: Perform a regression analysis on [dataset_path] with features [features] and target [target]. Output: Coefficients, R^2, diagnostics, and a plot.
- Multicollinearity Detection and Remedies: Role: You are a Python data analyst. Task: Detect multicollinearity in [dataset_path] using VIF and propose remedies. Output: VIF values and remedial actions.
- Plotting Best Practices for Publication-Ready Figures: Role: You are a Python data analyst. Task: Create publication-ready figures from [dataset_path], with clear labeling and accessibility considerations. Output: Plotting code and a visual quality checklist.
- Automating Export of Figures to PNG/PDF: Role: You are a Python data analyst. Task: Automate export of all visuals from a notebook on [dataset_path] to PNG/PDF with consistent naming. Output: A script and a folder structure plan.
- Data Quality Issue Tracking and Resolution: Role: You are a Python data analyst. Task: Track and resolve data quality issues found in [dataset_path], including root cause and fixes. Output: An issues log and a remediation plan.
- Ethical Considerations in Data Analysis Prompts: Role: You are a Python data analyst. Task: Outline ethical considerations for analyses on [dataset_path], including bias, fairness, and privacy. Output: A short guideline and a compliance checklist.
Markdown Template
100 Best ChatGPT Prompts for Python Data Analysis
# 100 Best ChatGPT Prompts for Python Data Analysis
**Load and Inspect Dataset Summary**: Role: You are a Python data analyst. Task: Load the dataset from [dataset_path] and generate a concise summary of its structure and basic statistics. Context: Dataset contains columns [columns]. Output: A JSON object with keys: total_rows, column_types, missing_values_per_column, sample_head (first 5 rows). Constraints: Use pandas; avoid altering the dataset; handle non-numeric values gracefully; return results in the defined JSON structure. Output: JSON.
**Compute Basic Descriptive Statistics**: Role: You are a Python data analyst. Task: Compute descriptive statistics for numeric columns in [dataset_path]. Context: Use pandas to calculate mean, median, std, min, max, and IQR per numeric column. Output: A compact JSON table with columns: column, mean, median, std, min, max, IQR. Constraints: Exclude non-numeric columns; handle NaNs by ignoring them in calculations.
**Identify Missing Values by Column**: Role: You are a Python data analyst. Task: Identify and report missing values by column for [dataset_path]. Context: Provide counts and missing value percentages. Output: CSV snippet or JSON with columns: column, missing_count, missing_percent. Constraints: Include only columns with any missing values.
**Handle Outliers with IQR Rule**: Role: You are a Python data analyst. Task: Detect and cap outliers in numeric columns of [dataset_path] using the IQR rule (1.5 * IQR). Context: Provide a transformation plan and a sample before/after snapshot. Output: Python code snippet that applies winsorization and a JSON summary of outliers removed per column.
**Normalize or Scale Features for Analysis**: Role: You are a Python data analyst. Task: Normalize numeric features in [dataset_path] using Min-Max scaling or StandardScaler. Context: Choose scaling per feature type with justification. Output: A Python function snippet and a JSON summary of scaled feature ranges or means/stds. Constraints: Output should not modify non-numeric columns.
**Correlation Matrix and Heatmap Guide**: Role: You are a Python data analyst. Task: Compute a correlation matrix for numeric features in [dataset_path] and prepare a heatmap-ready data table. Context: Include strong and weak correlations, highlight potential multicollinearity. Output: Python code to generate a heatmap (Seaborn/Matplotlib) and a JSON snippet of top 5 correlated pairs.
**GroupBy Aggregation Report**: Role: You are a Python data analyst. Task: Produce aggregations by group for [dataset_path] based on column [group_col] and metrics [metrics]. Context: Include count, mean, median, max, min per group. Output: A Pandas DataFrame and a JSON summary of results. Constraints: Support multiple groupings if provided.
**Time Series Resampling and Rolling Stats**: Role: You are a Python data analyst. Task: Resample a time-series in [dataset_path] to [resample_freq], compute rolling statistics (window=[window_size]). Context: Ensure datetime parsing and timezone handling. Output: A Python snippet to resample and a table of rolling metrics. Constraints: If missing timestamps, report gaps.
**Detect Seasonal Trends in Time Series**: Role: You are a Python data analyst. Task: Analyze seasonality in [timeseries_column] of [dataset_path] using decomposition (classical or STL). Context: Provide seasonal, trend, residual components and a plot option. Output: Python code and a JSON summary of seasonal strength by period.
**Visualize Distribution with Histograms and KDE**: Role: You are a Python data analyst. Task: Create distribution visualizations for numeric features in [dataset_path] using histograms and KDE plots. Context: Include x-axis labels, titles, and A/B comparison if [compare_feature] is provided. Output: Python plotting code and a JSON with feature names and distribution metrics (mean, median, skew).
**Categorical Encoding Strategy Evaluation**: Role: You are a Python data analyst. Task: Compare encoding strategies for categorical features in [dataset_path] (one-hot, target encoding, ordinal). Context: Provide recommendations based on prediction task and data size. Output: A decision table in JSON and a short justification.
**Feature Engineering Plan for Predictive Modeling**: Role: You are a Python data analyst. Task: Propose feature engineering steps for [dataset_path] to improve a model predicting [target_column]. Context: Include interactions, aggregations, and handling of missing values. Output: A Python function outline and a feature dictionary with rationale.
**Hypothesis Testing Setup and Execution**: Role: You are a Python data analyst. Task: Formulate and run a hypothesis test (e.g., t-test or Mann-Whitney) on [dataset_path] comparing groups [group_col] by [target_col]. Output: Test statistic, p-value, assumptions check, and an interpretation statement.
**P-Value Interpretation for Data Analysts**: Role: You are a Python data analyst. Task: Explain p-values and confidence intervals for results from [dataset_path] in plain language. Context: Provide examples with [sample_size], [effect_size], and [significance_level]. Output: A concise interpretation and a one-page explainer code snippet.
**Bootstrapping for Confidence Intervals**: Role: You are a Python data analyst. Task: Implement bootstrap resampling on [dataset_path] to estimate confidence intervals for [metrics]. Context: Use [bootstrap_samples] samples and report percentile CIs. Output: Python function and a JSON CI summary.
**Create Reproducible Data Analysis Notebook Template**: Role: You are a Python data analyst. Task: Generate a reusable Jupyter notebook template for [topic] analyses on [dataset_path]. Context: Include sections for data loading, cleaning, EDA, modeling, and validation. Output: Notebook skeleton in Python code cells and Markdown cells.
**SQL Query to Pandas DataFrame Bridge**: Role: You are a Python data analyst. Task: Write an SQL query plan to extract data from [database] and read into a Pandas DataFrame with [tables]. Context: Include joins, where conditions, and groupings. Output: SQL snippet and Pandas read_sql code block.
**Data Cleaning Pipeline with Pandas**: Role: You are a Python data analyst. Task: Build a clean-and-transform pipeline for [dataset_path], handling type coercion, missing values, and outliers. Output: A single Python function that returns a cleaned DataFrame and a JSON summary of cleaning steps.
**Data QA Checks and Validation Rules**: Role: You are a Python data analyst. Task: Define data quality checks for [dataset_path], including schema conformance, range checks, and uniqueness tests. Output: A Python test suite (pytest) and a JSON checklist.
**Anomaly Detection in Data**: Role: You are a Python data analyst. Task: Implement a lightweight anomaly detection on [dataset_path] using z-scores or isolation forest. Context: Flag anomalies per column and provide remediation suggestions. Output: Python code and a JSON report of anomalies.
**Dimensionality Reduction Overview with PCA**: Role: You are a Python data analyst. Task: Apply PCA to [dataset_path] for features [features] to reduce to [n_components] components. Output: Scree plot code, explained variance JSON, and a reduced DataFrame.
**Feature Correlation vs Causation Considerations**: Role: You are a Python data analyst. Task: Analyze correlation results in [dataset_path] and provide cautions about misinterpreting causation. Output: A short explainer and a JSON summary of correlated feature pairs with notes.
**Missing Value Imputation Strategy**: Role: You are a Python data analyst. Task: Propose and implement an imputation strategy for [dataset_path] addressing both numerical and categorical columns. Output: Imputed DataFrame and a JSON justification.
**Data Type Casting and Precision Control**: Role: You are a Python data analyst. Task: Normalize data types in [dataset_path], converting columns to appropriate dtypes and controlling precision. Output: A Python function and a summary of dtype changes.
**Timezone-Aware Timestamp Handling**: Role: You are a Python data analyst. Task: Normalize and convert timestamp columns in [dataset_path] to a consistent timezone. Output: Python snippet and a JSON summary of converted columns.
**Data Visualization Storyboard for Stakeholders**: Role: You are a Python data analyst. Task: Create a storyboard of visualizations for [dataset_path] to tell a data-driven story about [business_goal]. Output: A set of chart specs and a quick narrative.
**Champion a Lightweight Data Profiling Report**: Role: You are a Python data analyst. Task: Generate a compact profiling report for [dataset_path], including top features by variance, missingness, and cardinality. Output: A one-page JSON report and a summary table.
**Aggregate Functions Optimization with Pandas**: Role: You are a Python data analyst. Task: Optimize a set of aggregate operations in [dataset_path] by selecting appropriate groupby aggregates and avoiding chained assignments. Output: Refactored code and a benchmark summary.
**Efficiency Tips for Large Datasets**: Role: You are a Python data analyst. Task: Propose efficiency improvements for working with large datasets in [dataset_path], including chunking, lazy loading, and vectorized ops. Output: A plan with code snippets.
**Memory-Efficient Data Frames with Dtypes**: Role: You are a Python data analyst. Task: Optimize memory usage of DataFrames in [dataset_path] by downcasting numeric dtypes. Output: A memory impact report and a Python function.
**Data Alignment Across Multiple Sources**: Role: You are a Python data analyst. Task: Align and merge data from sources A and B (paths [sourceA], [sourceB]) on key [join_key]. Output: A merged DataFrame and a JSON reconciliation summary.
**Merge and Join Strategy Evaluation**: Role: You are a Python data analyst. Task: Evaluate merge strategies (inner/outer/left/right) for [dataset_path] with [reference]. Output: A comparison table and Python code to perform the chosen merge.
**Map and Apply vs Vectorize Performance**: Role: You are a Python data analyst. Task: Benchmark map/apply vs vectorized operations on [dataset_path] for [operation]. Output: A performance chart and a short conclusion with recommended approach.
**Data Pipeline Debugging Checklist**: Role: You are a Python data analyst. Task: Create a debugging checklist for a data pipeline on [dataset_path], including common failure modes and fixes. Output: A checklist table in JSON and Markdown.
**Statistical Summary by Group**: Role: You are a Python data analyst. Task: Generate a statistical summary by group for [dataset_path] using [group_col] and [metrics]. Output: A grouped JSON summary and DataFrame snippet.
**Outlier Robustness Checks**: Role: You are a Python data analyst. Task: Evaluate robustness of your analysis to outliers in [dataset_path] by comparing results with and without outlier removal. Output: A report with effect sizes and a Python snippet.
**Compute Feature Importances from a Model**: Role: You are a Python data analyst. Task: Given a trained model on [dataset_path], compute SHAP values or feature importances for [features]. Output: A JSON summary and a small visualization script.
**Baseline Model Evaluation with Train/Test Split**: Role: You are a Python data analyst. Task: Create a baseline model evaluation pipeline on [dataset_path], using a train/test split and a baseline algorithm. Output: Performance metrics in JSON and code.
**Cross-Validation Setup in Python**: Role: You are a Python data analyst. Task: Set up K-Fold cross-validation on [dataset_path] for the target [target_column]. Output: A cross-validation plan, code snippet, and expected metrics JSON.
**Hyperparameter Tuning Plan for a Dataset**: Role: You are a Python data analyst. Task: Design a hyperparameter search plan for a model trained on [dataset_path], including parameter grid and scoring metric. Output: A tuning script and results dictionary.
**Data Visualization with Seaborn Themes**: Role: You are a Python data analyst. Task: Create a set of Seaborn visuals for [dataset_path] applying theme [theme_name] to improve readability. Output: Plotting code and a JSON summary of visuals.
**Plotly Interactive Dashboard Script**: Role: You are a Python data analyst. Task: Build an interactive Plotly dashboard for [dataset_path] showcasing key metrics [metrics]. Output: A Python script and a JSON outline of widgets and callbacks.
**Time Series Forecasting Preparation**: Role: You are a Python data analyst. Task: Prepare data and features for time series forecasting on [dataset_path], including train/validation split and feature engineering (lags, rolling stats). Output: A documentation snippet and code.
**Seasonality Decomposition of a Series**: Role: You are a Python data analyst. Task: Decompose [series_name] in [dataset_path] into trend, seasonal, and residual components. Output: Decomposition plots and a JSON summary.
**ARIMA vs Prophet Quick Comparison**: Role: You are a Python data analyst. Task: Compare ARIMA and Prophet approaches for [dataset_path] on [series_name]. Output: Side-by-side metrics, plotting code, and a recommendation.
**Granger Causality and Time Series Causality**: Role: You are a Python data analyst. Task: Assess Granger causality between [seriesA] and [seriesB] in [dataset_path]. Output: Test results and interpretation.
**Smoothing Techniques for Noise Reduction**: Role: You are a Python data analyst. Task: Apply smoothing (Moving Average, Savitzky-Golay) to [series_name] in [dataset_path]. Output: Smoothed series and a comparison chart.
**Rolling Window Correlation Analysis**: Role: You are a Python data analyst. Task: Compute rolling correlation between [featureA] and [featureB] in [dataset_path] with window size [window]. Output: A DataFrame and a JSON summary.
**Data Sampling Techniques for Big Data**: Role: You are a Python data analyst. Task: Propose sampling strategies for large datasets ([dataset_path]) to enable quick exploratory analysis. Output: A sampling plan and Python code.
**Handling Imbalanced Classes in Data**: Role: You are a Python data analyst. Task: Address class imbalance in [dataset_path] using resampling (SMOTE, undersampling) or algorithm adjustments. Output: Implementation code and evaluation metrics.
**Text Data Cleaning for Analysis**: Role: You are a Python data analyst. Task: Preprocess text data in [dataset_path], including normalization, tokenization, and stopword removal. Output: Cleaned text column and a summary of changes.
**NLP Feature Extraction with TF-IDF**: Role: You are a Python data analyst. Task: Extract TF-IDF features from text data in [dataset_path] for [text_column]. Output: Feature matrix shape, vocabulary, and a sample vector.
**Clustering with K-Means and Evaluation**: Role: You are a Python data analyst. Task: Apply K-Means to [dataset_path] on features [features] and evaluate using silhouette score. Output: Cluster assignments, centroids, and evaluation JSON.
**DBSCAN Density-Based Clustering Guide**: Role: You are a Python data analyst. Task: Perform DBSCAN clustering on [dataset_path] with parameters [eps], [min_samples]. Output: Cluster labels and a JSON summary of results.
**Dimensionality Reduction with t-SNE**: Role: You are a Python data analyst. Task: Apply t-SNE to reduce dimensions of [dataset_path] with [perplexity] and [n_components]. Output: 2D coordinates and a plot. Constraints: Report interpretation cautions.
**Data Quality Scorecard Generation**: Role: You are a Python data analyst. Task: Generate a data quality scorecard for [dataset_path] including completeness, consistency, accuracy, and timeliness. Output: JSON scorecard and a markdown summary.
**Unit Tests for Data Analysis Functions**: Role: You are a Python data analyst. Task: Write unit tests for common data analysis functions used on [dataset_path]. Output: Pytest tests and a test report.
**Documenting Data Transformations**: Role: You are a Python data analyst. Task: Generate documentation for data transformations applied to [dataset_path], including inputs, outputs, and rationale. Output: Markdown doc and JSON summary.
**Reproducibility with Random Seeds**: Role: You are a Python data analyst. Task: Ensure reproducibility across analyses on [dataset_path] by setting seeds and documenting environments. Output: A reproducibility plan and code snippet.
**Data Versioning Strategy**: Role: You are a Python data analyst. Task: Propose a data versioning strategy for [dataset_path], including storage, metadata, and lineage. Output: Plan with example commands and a JSON audit log.
**Notebook Organization and Naming Conventions**: Role: You are a Python data analyst. Task: Define best practices for organizing notebooks for analyses on [dataset_path]. Output: A naming convention guide and a sample notebook outline.
**Virtual Environment and Dependency Management**: Role: You are a Python data analyst. Task: Create a clean environment plan for reproducible analyses on [dataset_path], including dependencies and version pinning. Output: A requirements.txt and conda environment YAML.
**Data Analysis Project Kickoff Plan**: Role: You are a Python data analyst. Task: Outline a kickoff plan for a new analysis on [dataset_path], including goals, stakeholders, milestones, and risk. Output: A project plan in JSON and a brief executive summary.
**Benchmarking Data Processing Pipelines**: Role: You are a Python data analyst. Task: Benchmark data loading, cleaning, and analysis steps on [dataset_path] and compare performance across approaches. Output: A benchmark table and Python scripts.
**Profiling Python Code with cProfile**: Role: You are a Python data analyst. Task: Profile a data analysis script on [script_path] to identify bottlenecks in [dataset_path]. Output: cProfile results and a short optimization plan.
**Memory Profiling Utilities for Data Analysis**: Role: You are a Python data analyst. Task: Profile memory usage for a data analysis workflow on [dataset_path] and propose reductions. Output: Memory stats and optimization tips.
**Parallelization with Multiprocessing and Dask**: Role: You are a Python data analyst. Task: Parallelize a data processing task on [dataset_path] using multiprocessing or Dask. Output: Parallel code, performance comparison, and a JSON summary.
**Vectorized Operations vs Loops in Pandas**: Role: You are a Python data analyst. Task: Compare vectorized operations vs loop-based approaches for a given [operation] on [dataset_path]. Output: Benchmark results and recommended approach.
**Grouped Custom Aggregations with agg**: Role: You are a Python data analyst. Task: Implement custom aggregations using .agg on [dataset_path] grouped by [group_col]. Output: Code snippet and a JSON summary of results.
**Pivot Tables for Multi-Dimensional Analysis**: Role: You are a Python data analyst. Task: Create pivot tables from [dataset_path] to analyze [dimensions] with measures [metrics]. Output: Pivot code, and a JSON summary of results.
**Data Pivot Table vs SQL Pivot**: Role: You are a Python data analyst. Task: Compare Python pivot table techniques with SQL pivots for [dataset_path]. Output: Pros/cons table and example queries/transformations.
**Data Storytelling and Executive Summary**: Role: You are a Python data analyst. Task: Produce an executive summary and storytelling narrative for [dataset_path] focusing on [business_goal]. Output: A 1-page narrative and a visual summary.
**KPI Extraction from Data**: Role: You are a Python data analyst. Task: Identify and compute key performance indicators for [dataset_path] aligned with [business_goals]. Output: KPI table and a JSON rationale.
**Automated Report Generation in Python**: Role: You are a Python data analyst. Task: Generate an automated data report for [dataset_path], including charts, tables, and insights. Output: A Jupyter export or PDF-ready report content.
**Excel Data Import and Cleanup Automation**: Role: You are a Python data analyst. Task: Import data from [excel_path], clean and transform it for analysis on [dataset_path]. Output: Cleaned DataFrame and an import script.
**CSV Validation and Schema Enforcement**: Role: You are a Python data analyst. Task: Validate a CSV against a schema with columns [columns] and types [types] for [dataset_path]. Output: Validation report and a corrected CSV suggestion.
**Data Serialization Formats and Trade-offs**: Role: You are a Python data analyst. Task: Compare data serialization formats (CSV, Parquet, Feather) for a workflow on [dataset_path]. Output: A decision guide and sample I/O code.
**Data Anonymization for Privacy**: Role: You are a Python data analyst. Task: Apply anonymization techniques to sensitive columns in [dataset_path] while preserving analytics usefulness. Output: Anonymized DataFrame and a privacy impact JSON.
**Data Governance and Access Controls**: Role: You are a Python data analyst. Task: Propose governance controls for datasets used in [project], including access rules, versioning, and auditing. Output: Governance plan and a policy outline.
**Feature Scaling Impact on Models**: Role: You are a Python data analyst. Task: Analyze how scaling affects model performance on [dataset_path] and [target_column]. Output: Comparative metrics and plotting code.
**Auto-Documentation of Data Analysis Steps**: Role: You are a Python data analyst. Task: Generate auto-documentation for a data analysis workflow on [dataset_path], including inputs, outputs, and decisions. Output: Documentation in Markdown and JSON metadata.
**Exploratory Data Analysis Checklist**: Role: You are a Python data analyst. Task: Create an EDA checklist for [dataset_path], covering data quality, distributions, and assumptions. Output: Checklist and a starter notebook template.
**Handling Nulls in Numerical Features**: Role: You are a Python data analyst. Task: Decide and implement null handling for numerical features in [dataset_path] with justification. Output: Code snippet and a JSON summary of effects on statistics.
**Handling Nulls in Categorical Features**: Role: You are a Python data analyst. Task: Decide and implement null handling for categorical features in [dataset_path] with justification. Output: Code snippet and a JSON summary.
**Time Series Cross-Validation**: Role: You are a Python data analyst. Task: Set up time-series cross-validation for [dataset_path] on [series_column] with [cv]. Output: A cross-validation plan and code snippet.
**Forecast Accuracy Metrics**: Role: You are a Python data analyst. Task: Compute forecast accuracy metrics (MAPE, RMSE, MAE) for [dataset_path] forecasts vs actuals. Output: Metrics JSON and a plotting snippet.
**Visual Regression Testing for Dashboards**: Role: You are a Python data analyst. Task: Implement visual regression checks for dashboards built from [dataset_path]. Output: Test plan, code, and summary report.
**Data Analysis with Jupyter Widgets**: Role: You are a Python data analyst. Task: Build interactive widgets (sliders, selectors) to explore [dataset_path] dynamically. Output: Widget-enabled notebook snippet and a brief usage guide.
**Reusable Data Analysis Functions Library**: Role: You are a Python data analyst. Task: Create a small library of reusable data analysis functions for common tasks on [dataset_path]. Output: Python module with tests and a usage example.
**Data Analysis Script Refactoring**: Role: You are a Python data analyst. Task: Refactor an analysis script on [dataset_path] to improve readability and performance. Output: Refactored script and a before/after benchmark.
**Data Analysis with Spark via PySpark**: Role: You are a Python data analyst. Task: Analyze a dataset in Spark via PySpark for [dataset_path], including simple aggregations. Output: PySpark script and a results JSON.
**GPU-Accelerated Data Analysis with CuPy**: Role: You are a Python data analyst. Task: Leverage CuPy for a compute-heavy operation on [dataset_path]. Output: CuPy-based code and performance summary.
**Data Analysis with NumPy Beyond Basics**: Role: You are a Python data analyst. Task: Apply advanced NumPy operations to [dataset_path], including broadcasting and vectorization. Output: Code snippet and a JSON result summary.
**Naive Bayes/Classification Prompts**: Role: You are a Python data analyst. Task: Build a simple classifier on [dataset_path] using Naive Bayes and evaluate. Output: Model code, confusion matrix, and accuracy.
**Regression Analysis Prompts**: Role: You are a Python data analyst. Task: Perform a regression analysis on [dataset_path] with features [features] and target [target]. Output: Coefficients, R^2, diagnostics, and a plot.
**Multicollinearity Detection and Remedies**: Role: You are a Python data analyst. Task: Detect multicollinearity in [dataset_path] using VIF and propose remedies. Output: VIF values and remedial actions.
**Plotting Best Practices for Publication-Ready Figures**: Role: You are a Python data analyst. Task: Create publication-ready figures from [dataset_path], with clear labeling and accessibility considerations. Output: Plotting code and a visual quality checklist.
**Automating Export of Figures to PNG/PDF**: Role: You are a Python data analyst. Task: Automate export of all visuals from a notebook on [dataset_path] to PNG/PDF with consistent naming. Output: A script and a folder structure plan.
**Data Quality Issue Tracking and Resolution**: Role: You are a Python data analyst. Task: Track and resolve data quality issues found in [dataset_path], including root cause and fixes. Output: An issues log and a remediation plan.
**Ethical Considerations in Data Analysis Prompts**: Role: You are a Python data analyst. Task: Outline ethical considerations for analyses on [dataset_path], including bias, fairness, and privacy. Output: A short guideline and a compliance checklist.Best Practices
Leverage the prompts as a modular library: reuse placeholders, standardize output formats, and maintain a reproducible environment. Validate results with small test datasets and document decisions for future audits.
Common Mistakes to Avoid
- Assuming correlations imply causation.
- Hard-coding dataset paths in prompts without placeholders.
- Skipping output format specifications, leading to inconsistent results.
- Overlooking data provenance and versioning in analyses.
Related resources
Use these related resources to move from prompt examples into real AI workflows, implementation demos, and topic-specific business use cases.
- ChatGPT
- AI Prompts Library
- ChatGPT Prompts
- AI Lab demos
- Business intelligence ChatGPT prompts
- Financial analysis ChatGPT prompts
- Structured outputs with Pydantic
- Sync TypeScript and Python schemas
- Why analytics products need metric definitions
- Organizational knowledge brain AI Lab
FAQ
What makes these prompts effective for Python Data Analysis?
They are role-driven, task-specific, and include placeholders, constraints, and explicit output formats for repeatable analysis.
How do I customize prompts for my dataset?
Replace placeholders like [dataset_path], [columns], [target_column], and [metrics] with your data and metrics. Keep the output format consistent for easy automation.
Can I use these prompts for large-scale data?
Yes, but you should add performance constraints and consider sampling, chunking, and distributed processing when needed.
How should I verify the outputs?
Run the generated Python code on a sample dataset, check shapes, data types, and basic statistics, and compare results to a known baseline when available.
Are these prompts suitable for notebook automation?
Absolutely. They are designed to produce code and results that can be dropped into notebooks or pipelines with minimal tweaks.