100 Best Gemini Prompts for Data Analysis
A practical Gemini prompts library with 100 data analysis prompts to drive analysis tasks, generate outputs, and support data-driven decisions.
Best For
Data analysts, data scientists, analytics teams
Prompt Use Cases
- Exploratory analysis
- Data cleaning and imputation
- Model evaluation and reporting
- Visualization design and storytelling
- Executive data summaries
Introduction
Welcome to the Data Analysis Gemini prompts library. This practical page is crafted for data analysts, data scientists, and analytics teams who want to drive repeatable, output-ready analysis tasks with Google Gemini prompts. Use these prompts to accelerate exploring datasets, cleaning data, computing statistics, and delivering insights—without inventing context on the fly.
Whether you are building quick analyses, validating findings, or producing stakeholder-ready reports, this Gemini prompts collection provides copyable, context-rich prompts you can adapt to any dataset labeled with placeholders like [dataset_name], [columns], [rows], and [domain].
Direct Answer
The best Gemini prompts for Data Analysis are a comprehensive, copyable set of 100 prompts that cover data exploration, cleaning, modeling, visualization, and reporting. Each prompt includes a role, task, context placeholders, a clear output format, and constraints so you can run analyses consistently across datasets and teams.
How to Use These Gemini Prompts
- Replace placeholders like [dataset_name], [columns], [rows], [domain], and similar tokens with your actual dataset details.
- Apply constraints to tailor outputs (e.g., output format: JSON; preserve reproducibility with parameter logs).
- Request structured outputs (e.g., a JSON report with sections: overview, methods, results, metrics, visuals, and actions).
- Specify required metrics and audience in [audience], [metric], and [domain] to ensure relevance.
- Verify outputs by comparing with your own analyses or known benchmarks and iterating with revised prompts.
100 Best Gemini Prompts for Data Analysis
- EDA and Summary Statistics - Role: You are a data analytics assistant using Google Gemini to perform Exploratory Data Analysis on dataset [dataset_name]. Task: Generate comprehensive summary statistics and initial data quality checks for [columns]. Context: Dataset has [rows] rows, domain is [domain], missing values [missing_values_info], data quality notes [data_quality]. Output: Return a structured JSON report with sections: overview, data_quality, summary_statistics, distribution_descriptions, and recommended next steps. Constraints: Provide reproducible parameters, specify which columns were analyzed, include a short visual-ideas list for potential plots, and use [format] format.
- Univariate Distribution Analysis - Role: You are a data analytics assistant using Google Gemini to analyze univariate distributions. Task: For dataset [dataset_name], compute distribution metrics (mean, median, mode, variance, skewness, kurtosis) for numeric columns and report frequency for categorical ones. Context: Columns [columns], rows [rows], domain [domain]. Output: JSON with per-column distribution metrics and recommended transformations. Constraints: Highlight potential outliers and report data quality caveats.
- Bivariate Relationships and Correlation - Role: You are a data analytics assistant using Google Gemini to explore relationships between pairs of features. Task: Identify strong or non-linear relationships between numeric columns in dataset [dataset_name], report correlation matrices, and suggest potential interaction terms. Context: Columns [columns], sample size [rows], analysis_goal [analysis_goal]. Output: A structured JSON with correlation matrix, scatter-pattern notes, and suggested next steps. Constraints: Include at least 3 strong relationships and note any non-linear patterns.
- Data Cleaning and Missing Values Strategy - Role: You are a data analytics assistant. Task: Propose a missing data handling strategy for dataset [dataset_name], including imputation methods per column type, outlier handling plan, and data quality improvements. Context: Columns [columns], missing_values [missing_values_info], domain [domain]. Output: JSON report detailing chosen imputation methods, rationale, and a ready-to-run imputation blueprint. Constraints: Prioritize preserving data integrity and documenting assumptions.
- Outlier Detection and Treatment Plan - Role: You are a data analytics assistant. Task: Detect and document outliers in dataset [dataset_name] across numeric columns [columns], classify as actionable or non-actionable, and propose treatment options. Context: [rows] rows, data quality notes [data_quality]. Output: Structured JSON including outlier counts by column, justification, and recommended handling (cap, transformation, or retention). Constraints: Include a heatmap-ready outline for visualization.
- Feature Engineering for Analysis - Role: You are a data analytics assistant. Task: Propose and implement feature engineering steps for dataset [dataset_name] to improve analysis quality for [analysis_goal]. Context: Columns [columns], domain [domain], intended audience [audience], constraints [constraints]. Output: JSON with feature list, rationale, and example formulas. Constraints: Include at least 5 new features with brief explanations.
- Time Series Diagnostics and Decomposition - Role: You are a data analytics assistant. Task: Run time-series diagnostics on dataset [dataset_name] with timestamp column [time_column], identifying trend, seasonality, and anomalies. Context: Data length [rows], frequency [frequency], domain [domain]. Output: JSON report with decomposition summary, key patterns, and suggested forecasting approach. Constraints: Provide plots metadata and computation notes.
- Seasonality and Trend Exploration - Role: You are a data analytics assistant. Task: Analyze seasonality and trend behaviors in dataset [dataset_name] for metric [metric], across periods defined by [seasonality_definition]. Context: Time index [time_index], data quality [data_quality]. Output: JSON results including seasonal indices, trend strength, and recommendations for forecasting models. Constraints: Use robust seasonality tests and document assumptions.
- Anomaly Detection Setup - Role: You are a data analytics assistant. Task: Set up an anomaly detection framework for dataset [dataset_name], specifying methods (statistical, isolation, or ML-based), thresholds, and monitoring cadence. Context: Columns [columns], period [period], domain [domain], data quality [data_quality]. Output: JSON blueprint with selected method, thresholds, and validation plan. Constraints: Provide fallback rules and alert criteria.
- Hypothesis Testing Plan - Role: You are a data analytics assistant. Task: Design a hypothesis testing plan for dataset [dataset_name] to test [hypothesis], including test type, assumptions, sample size considerations, and interpretation guidelines. Context: Columns [columns], data distribution notes [notes], domain [domain]. Output: JSON with test choice justification, power analysis, and decision criteria. Constraints: Include multiple testing adjustments if applicable.
- GroupBy and Aggregation Lab - Role: You are a data analytics assistant. Task: Use group-by operations on dataset [dataset_name] to compute aggregated metrics by [group_by_columns] and summarize results for decision-makers. Context: Rows [rows], domain [domain]. Output: JSON with group keys, aggregates, and a recommended next visualization. Constraints: Include at least 3 groups and 4 metrics.
- Categorical Encoding Decisions - Role: You are a data analytics assistant. Task: Propose and implement encoding strategies for categorical features in dataset [dataset_name], balancing model performance and interpretability. Context: Categorical columns [columns], target variable [target], domain [domain]. Output: JSON detailing encoding methods, rationale, and sample encoded outputs. Constraints: Prefer target encoding or one-hot depending on sparsity.
- Pivot and Cross-Tab Report - Role: You are a data analytics assistant. Task: Create pivot/cross-tab reports for dataset [dataset_name] across [pivot_dimensions] and [metrics], highlighting notable patterns and gaps. Context: Rows [rows], domain [domain]. Output: JSON with tables-ready data structures and narrative notes. Constraints: Include at least 3 pivot views.
- Data Quality Scoring - Role: You are a data analytics assistant. Task: Develop a data quality scorecard for dataset [dataset_name], scoring completeness, consistency, accuracy, timeliness, and validity. Context: Columns [columns], rows [rows], domain [domain]. Output: JSON with scores by dimension, high-impact issues, and remediation plan. Constraints: Use a transparent scoring rubric.
- Effect Size and Power Analysis - Role: You are a data analytics assistant. Task: Estimate effect sizes for [analysis_goal] in dataset [dataset_name], and perform power analysis to determine required sample size for future studies. Context: Metric [metric], groups [groups], data quality [data_quality]. Output: JSON with effect size estimates, confidence intervals, and recommended sample size. Constraints: Include assumptions and limitations.
- Bootstrap and Confidence Intervals - Role: You are a data analytics assistant. Task: Apply bootstrap methods to estimate confidence intervals for key statistics in dataset [dataset_name], across [columns]. Context: Data quality [data_quality], sample size [rows]. Output: JSON with bootstrap estimates, percentile intervals, and interpretation notes. Constraints: Use 1000 bootstrap samples by default.
- Regression Diagnostics - Role: You are a data analytics assistant. Task: Perform regression diagnostics for model on dataset [dataset_name], checking residuals, heteroscedasticity, and multicollinearity. Context: Features [features], target [target], data_quality [data_quality]. Output: JSON with diagnostic results and recommended model refinements. Constraints: Include plots metadata and actionable steps.
- Classification Pipeline Prompt - Role: You are a data analytics assistant. Task: Build a simple classification analysis pipeline for dataset [dataset_name], selecting features [features], evaluation metric [metric], and reporting interpretation guidance. Context: Data_quality [data_quality], domain [domain]. Output: JSON with model choices, cross-validation plan, and evaluation summary. Constraints: Prioritize interpretability.
- Clustering and Segment Discovery - Role: You are a data analytics assistant. Task: Identify customer segments in dataset [dataset_name] using clustering on features [features], evaluate cluster validity, and provide actionable segment profiles. Context: Rows [rows], domain [domain], data quality [data_quality]. Output: JSON with cluster assignments, centroid descriptions, and recommended actions. Constraints: Include at least 3 clusters.
- Principal Component Analysis Prompt - Role: You are a data analytics assistant. Task: Apply PCA to dataset [dataset_name] on features [features], determine explained variance, and provide a plan for dimensionality reduction. Context: Data quality [data_quality], domain [domain]. Output: JSON with components, loadings summary, and suggested use cases. Constraints: Recommend number of components to retain.
- Visualization Design Brief - Role: You are a data analytics assistant. Task: Create a visualization design brief for dataset [dataset_name], mapping key metrics to visuals suitable for [audience], and noting accessibility requirements. Context: Data quality [data_quality], domain [domain]. Output: JSON with visualization recommendations, chart types, and legend notes. Constraints: Prioritize clarity and accessibility.
- Statistical Significance Reporting - Role: You are a data analytics assistant. Task: Report statistical significance for findings in dataset [dataset_name], including p-values, confidence intervals, and practical significance notes. Context: Columns [columns], analysis_goal [analysis_goal]. Output: JSON with results and interpretation guidance. Constraints: Distinguish statistical vs practical significance.
- Bayesian Data Analysis Overview - Role: You are a data analytics assistant. Task: Outline a Bayesian analysis plan for dataset [dataset_name], including priors, posterior inference approach, and decision rules. Context: Domain [domain], data quality [data_quality]. Output: JSON with model specification and justification. Constraints: Provide prior sensitivity notes.
- Forecasting Model Evaluation - Role: You are a data analytics assistant. Task: Evaluate forecasting models for dataset [dataset_name], comparing [models], accuracy metrics, and calibration. Context: Time-series data [time_column], horizon [horizon], domain [domain]. Output: JSON with model rankings and recommended best model. Constraints: Include out-of-sample evaluation plan.
- ANOVA and Post-hoc Testing - Role: You are a data analytics assistant. Task: Plan and execute ANOVA on dataset [dataset_name] for groups [groups], reporting effect sizes and post-hoc comparisons. Context: Data quality [data_quality], assumptions checked [assumptions]. Output: JSON with test results and interpretation. Constraints: Include multiple comparison adjustment.
- Correlation vs Causation Reasoning - Role: You are a data analytics assistant. Task: Provide a structured reasoning note distinguishing correlation from causation for findings in dataset [dataset_name], with recommended counterfactual checks. Context: Columns [columns], domain [domain]. Output: JSON with reasoning steps and suggested follow-up analyses. Constraints: Avoid overclaiming causality.
- Data Reshaping and Tidying - Role: You are a data analytics assistant. Task: Propose and implement data reshaping (wide↔long) for dataset [dataset_name] to support analyses on [analysis_goal]. Context: Columns [columns], rows [rows]. Output: JSON with reshaping plan and resulting schema. Constraints: Maintain data lineage.
- Data Provenance and Lineage - Role: You are a data analytics assistant. Task: Document data provenance and lineage for dataset [dataset_name], including source, transformations, and versioning. Context: Data quality [data_quality], stakeholders [stakeholders]. Output: JSON with lineage graph and audit trail. Constraints: Ensure reproducibility.
- Experimentation and A/B Testing Setup - Role: You are a data analytics assistant. Task: Design an A/B testing analysis plan for dataset [dataset_name], including sample size, randomization checks, and interpretation rules. Context: Domain [domain], metric [metric], data quality [data_quality]. Output: JSON with experimental design and analysis steps. Constraints: Predefine success criteria.
- Data Leakage Prevention Plan - Role: You are a data analytics assistant. Task: Identify potential data leakage in dataset [dataset_name] analyses and propose prevention steps across data preparation, feature engineering, and evaluation. Context: Columns [columns], domain [domain]. Output: JSON with leakage risks and mitigations. Constraints: Document all assumptions.
- Model vs Baseline Comparison - Role: You are a data analytics assistant. Task: Compare a chosen model for dataset [dataset_name] against a simple baseline (e.g., mean/median) across [metrics], and report practical significance. Context: Domain [domain], data_quality [data_quality]. Output: JSON with comparison results and recommendations. Constraints: Include confusion matrix or error metrics where relevant.
- Feature Importance Extraction - Role: You are a data analytics assistant. Task: Extract and interpret feature importances for model on dataset [dataset_name], identifying top drivers of the target variable [target]. Context: Features [features], data_quality [data_quality]. Output: JSON with ranked features and rationale. Constraints: Include stability checks across folds.
- Residual Analysis and Diagnostics - Role: You are a data analytics assistant. Task: Analyze residuals of a model applied to dataset [dataset_name], checking for patterns, heteroscedasticity, and potential model misspecification. Context: Model [model], target [target], data_quality [data_quality]. Output: JSON with diagnostic plots notes and remediation suggestions. Constraints: Provide actionable steps.
- Time Series Forecasting Prompt - Role: You are a data analytics assistant. Task: Develop a forecasting plan for dataset [dataset_name] on metric [metric], specifying horizon, method choices, and evaluation strategy. Context: Time column [time_column], data_quality [data_quality], domain [domain]. Output: JSON with forecast plan and evaluation criteria. Constraints: Include uncertainty quantification.
- Time-To-Event Analysis Prompt - Role: You are a data analytics assistant. Task: Conduct time-to-event analysis on dataset [dataset_name], estimating survival functions and hazard rates for groups [groups]. Context: Time column [time_column], event indicator [event_column], domain [domain]. Output: JSON with survival estimates and interpretation notes. Constraints: Include censoring considerations.
- Survival Analysis Basics - Role: You are a data analytics assistant. Task: Provide a basic survival analysis plan for dataset [dataset_name], including Kaplan-Meier estimates and log-rank test plan. Context: Time [time], event [event], group [group], domain [domain]. Output: JSON with plan and expected outputs. Constraints: Clear interpretation guidance.
- Text Data Analysis Prompt - Role: You are a data analytics assistant. Task: Analyze text data in dataset [dataset_name] for topic and sentiment signals, detailing preprocessing steps and metrics. Context: Text column [text_column], domain [domain], language [lang]. Output: JSON with preprocessing outline, feature extraction method, and evaluation plan. Constraints: Include reproducibility notes.
- Sentiment and Topic Modeling Overview - Role: You are a data analytics assistant. Task: Generate an analysis plan for sentiment and topic modeling on text data in dataset [dataset_name], including model choices and evaluation criteria. Context: Text [text_columns], domain [domain]. Output: JSON with model recommendations, preprocessing steps, and result interpretation notes. Constraints: Provide reproducible parameter settings.
- Geospatial Data Analysis Starter - Role: You are a data analytics assistant. Task: Start a geospatial analysis for dataset [dataset_name], mapping metrics to geography using coordinates [lat, lon] or region labels [region]. Context: Domain [domain], data_quality [data_quality]. Output: JSON with spatial joins, heatmaps, and regional insights. Constraints: Include accuracy notes.
- Data Anonymization and Privacy Check - Role: You are a data analytics assistant. Task: Review dataset [dataset_name] for privacy risks and propose anonymization steps, including re-identification risk assessment. Context: Columns [columns], regulatory context [regulation]. Output: JSON with anonymization plan and risk mitigation. Constraints: Preserve analytic utility.
- Powerful Data Visualization Prompts - Role: You are a data analytics assistant. Task: Propose a set of 5 visualization prompts for dataset [dataset_name] that clearly communicates the main findings to executives and analysts. Context: Audience [audience], domain [domain], data_quality [data_quality]. Output: JSON with visualization concepts and rationale. Constraints: Prioritize clarity and actionability.
- Storytelling with Data Summary - Role: You are a data analytics assistant. Task: Create a concise data story for dataset [dataset_name] focusing on [topic], summarizing insights for stakeholders. Context: Audience [audience], domain [domain]. Output: JSON with narrative outline, key figures, and recommended actions. Constraints: Include an executive takeaway.
- Executive Summary for Stakeholders - Role: You are a data analytics assistant. Task: Produce an executive summary for dataset [dataset_name] addressing [analysis_goal], with succinct bullets and a one-page data brief. Context: Domain [domain], audience [audience]. Output: JSON with summary bullets and next-step recommendations. Constraints: Keep it to one page.
- Data-Driven KPI Dashboards - Role: You are a data analytics assistant. Task: Outline KPIs and dashboard structure for dataset [dataset_name] to monitor [business goals], including data sources and refresh cadence. Context: Domain [domain], audience [audience]. Output: JSON with dashboard schema and data lineage notes. Constraints: Recommend at least 6 KPIs.
- Root Cause Analysis Prompt - Role: You are a data analytics assistant. Task: Perform root cause analysis for a recurring issue in dataset [dataset_name], identifying drivers and correlations. Context: Issue [issue], data_quality [data_quality], domains [domains]. Output: JSON with causal factors, evidence summary, and recommended actions. Constraints: Avoid confirmation bias.
- Data-Driven Pricing Analysis - Role: You are a data analytics assistant. Task: Conduct a pricing analysis on dataset [dataset_name], evaluating elasticity, break-even points, and recommended price bands. Context: Product [product], market [market], data_quality [data_quality]. Output: JSON with pricing scenarios and decision criteria. Constraints: Include sensitivity analyses.
- Churn Prediction Readiness Check - Role: You are a data analytics assistant. Task: Assess readiness to build a churn prediction model on dataset [dataset_name], including data quality, feature availability, and baseline performance expectations. Context: Domain [domain], target [target], data_quality [data_quality]. Output: JSON with readiness score and gaps. Constraints: Prioritize actionable next steps.
- Customer Segmentation Prompt - Role: You are a data analytics assistant. Task: Create a customer segmentation plan using dataset [dataset_name], describing segments by key attributes and potential marketing actions. Context: Columns [columns], domain [domain], data_quality [data_quality]. Output: JSON with segment definitions, profiles, and recommended actions. Constraints: Ensure each segment has at least 100 observations.
- Sales funnel Analysis - Role: You are a data analytics assistant. Task: Analyze the sales funnel in dataset [dataset_name], computing conversion rates at each stage and identifying bottlenecks. Context: Stages [stages], domain [domain], data_quality [data_quality]. Output: JSON with funnel metrics and recommended optimizations. Constraints: Include confidence intervals.
- Market Basket Analysis Starter - Role: You are a data analytics assistant. Task: Perform market basket analysis on dataset [dataset_name], computing association rules for items in [items], and propose merchandising implications. Context: Transactions [transactions], domain [domain], data_quality [data_quality]. Output: JSON with rules, support/confidence metrics, and actionable insights. Constraints: Keep rule count to at most 20.
- Quality Control Chart Analysis - Role: You are a data analytics assistant. Task: Apply control chart analysis to dataset [dataset_name], identify when processes exceed control limits for metric [metric]. Context: Time [time], domain [domain], data_quality [data_quality]. Output: JSON with control chart signals, interpretation notes, and recommended actions. Constraints: Include upper/lower control limits.
- Time Series Forecasting Prompt - Role: You are a data analytics assistant. Task: Develop a forecasting plan for dataset [dataset_name] on metric [metric], specifying horizon, method choices, and evaluation strategy. Context: Time column [time_column], data_quality [data_quality], domain [domain]. Output: JSON with forecast plan and evaluation criteria. Constraints: Include uncertainty quantification.
- Cross-Validation Strategy Setup - Role: You are a data analytics assistant. Task: Propose a robust cross-validation strategy for a model trained on dataset [dataset_name], detailing folds, shuffling, and evaluation metrics. Context: Features [features], target [target], data_quality [data_quality]. Output: JSON with CV plan and rationale. Constraints: Adapt to time-series if needed.
- Hyperparameter Tuning for Analysis - Role: You are a data analytics assistant. Task: Outline a hyperparameter tuning plan for a model or analysis pipeline on dataset [dataset_name], including search space, evaluation metric, and stopping criteria. Context: Domain [domain], data_quality [data_quality]. Output: JSON with tuning plan and expected resources. Constraints: Prefer light-weight searches.
- Model Calibration and Reliability Check - Role: You are a data analytics assistant. Task: Assess calibration and reliability of a model on dataset [dataset_name], with calibration curves and reliability metrics. Context: Model [model], target [target], data_quality [data_quality]. Output: JSON with calibration results and adjustment recommendations. Constraints: Include plots metadata.
- NLP Data Cleaning and Tokenization - Role: You are a data analytics assistant. Task: Clean and tokenize textual data in dataset [dataset_name] for NLP analysis, including normalization steps and tokenization strategy. Context: Text column [text_column], domain [domain], language [lang]. Output: JSON with preprocessing pipeline and example tokens. Constraints: Ensure reproducibility.
- Experiment Reproducibility Checklist - Role: You are a data analytics assistant. Task: Build an experiment reproducibility checklist for analyses on dataset [dataset_name], including versioning, random seeds, and data lineage. Context: Domain [domain], data_quality [data_quality]. Output: JSON with checklist items and owner assignments. Constraints: Make it actionable.
- Data Pipeline Red Flags - Role: You are a data analytics assistant. Task: Identify red flags in a data pipeline for dataset [dataset_name], including ingestion, transformation, and load steps. Context: Pipeline stages [stages], domain [domain], data_quality [data_quality]. Output: JSON with risk scores and remediation plans. Constraints: Prioritize critical failures.
- Dimensionality Reduction Selection - Role: You are a data analytics assistant. Task: Recommend a dimensionality reduction method for dataset [dataset_name], given [data characteristics], and justify choice. Context: Columns [columns], domain [domain], data_quality [data_quality]. Output: JSON with method rationale and implementation notes. Constraints: Compare at least 2 methods.
- Data Drift Detection and Monitoring - Role: You are a data analytics assistant. Task: Design a data drift detection plan for dataset [dataset_name], including features to monitor, thresholds, and alerting. Context: Data source [source], domain [domain], data_quality [data_quality]. Output: JSON with drift metrics and monitoring cadence. Constraints: Include rollback plan.
- Ethics and Bias Review for Analysis - Role: You are a data analytics assistant. Task: Conduct an ethics and bias review of analyses on dataset [dataset_name], identifying potential biases and mitigation strategies. Context: Domain [domain], data_quality [data_quality]. Output: JSON with bias categories, impact assessment, and recommended mitigations. Constraints: Provide transparent justifications.
- Data Visualization Accessibility Check - Role: You are a data analytics assistant. Task: Evaluate all proposed visuals for dataset [dataset_name] against accessibility guidelines (color contrast, screen-reader friendliness, alt text). Context: Visuals [visuals], domain [domain]. Output: JSON with accessibility ratings and remediation steps. Constraints: Include fallback options.
- Multicollinearity Assessment - Role: You are a data analytics assistant. Task: Assess multicollinearity among numeric features in dataset [dataset_name], reporting VIFs and suggesting feature reduction. Context: Features [features], target [target], domain [domain]. Output: JSON with VIF table and recommended feature set. Constraints: Provide interpretation guidance.
- Outlier Robustness Check - Role: You are a data analytics assistant. Task: Evaluate model performance under outlier-robust vs standard configurations on dataset [dataset_name], reporting changes in accuracy or error metrics. Context: Columns [columns], domain [domain], data_quality [data_quality]. Output: JSON with robustness comparison and recommendations. Constraints: Include visual summaries.
- Quantile Analysis Prompt - Role: You are a data analytics assistant. Task: Perform quantile analysis on dataset [dataset_name] for numeric columns [columns], reporting percentiles and tail behavior. Context: Time [time], domain [domain], data_quality [data_quality]. Output: JSON with quantile summaries and interpretation notes. Constraints: Highlight extreme values.
- Join and Merge Strategy Evaluation - Role: You are a data analytics assistant. Task: Evaluate join/merge strategies for combining datasets in [project], focusing on key collisions, data loss, and performance for dataset [dataset_name]. Context: Keys [keys], domain [domain]. Output: JSON with recommended join strategy and data integrity notes. Constraints: Include test scenarios.
- Data Transformation and Normalization - Role: You are a data analytics assistant. Task: Propose data transformation and normalization steps for dataset [dataset_name] to ensure comparability across features [features]. Context: Domain [domain], data_quality [data_quality]. Output: JSON with transformation rules, scaling methods, and rationale. Constraints: Include edge-case handling.
- Imputation Strategy Comparison - Role: You are a data analytics assistant. Task: Compare imputation strategies for missing values in dataset [dataset_name], evaluating impact on downstream analyses for columns [columns]. Context: Rows [rows], data_quality [data_quality]. Output: JSON with method comparisons and recommended approach. Constraints: Include sensitivity notes.
- Forecast Error Analysis - Role: You are a data analytics assistant. Task: Analyze forecast errors for a model on dataset [dataset_name], identifying biases, variance, and error sources across horizons [horizon]. Context: Time_column [time_column], metric [metric], domain [domain]. Output: JSON with error diagnostics and improvement plan. Constraints: Include error decomposition.
- Residual vs Fitted Plots Prompt - Role: You are a data analytics assistant. Task: Produce residual vs fitted diagnostics for model on dataset [dataset_name], highlighting potential misspecification. Context: Model [model], target [target]. Output: JSON with plot metadata and interpretation notes. Constraints: Provide actionable remediation suggestions.
- Model Interpretability Prompt - Role: You are a data analytics assistant. Task: Generate interpretability notes for a model trained on dataset [dataset_name], including SHAP-like explanations or feature impact summaries. Context: Features [features], target [target], domain [domain]. Output: JSON with interpretation results and user-friendly explanations. Constraints: Prioritize clarity for non-technical stakeholders.
- Experiment Design for Observational Data - Role: You are a data analytics assistant. Task: Design an observational study analysis plan for dataset [dataset_name], including confounding controls and causal inference notes. Context: Domain [domain], data_quality [data_quality]. Output: JSON with plan and limitations. Constraints: State assumptions.
- Power Analysis for Sample Size - Role: You are a data analytics assistant. Task: Perform power analysis for a future study based on dataset [dataset_name], determining required sample size for detecting effect size [effect_size] with alpha [alpha] and power [power]. Context: Metric [metric], domain [domain]. Output: JSON with sample size recommendations and rationale. Constraints: Include scenario bounds.
- Data Export and Documentation Plan - Role: You are a data analytics assistant. Task: Create a data export and documentation plan for findings from dataset [dataset_name], including formats, metadata, and versioning. Context: Audience [audience], domain [domain]. Output: JSON with export specs and documentation templates. Constraints: Ensure traceability.
- Automation for Repetitive Analyses - Role: You are a data analytics assistant. Task: Propose automation steps to repeat analyses on dataset [dataset_name] across updates, including scheduling, parameterization, and logging. Context: Frequency [frequency], domain [domain], data_quality [data_quality]. Output: JSON with automation blueprint and sample run logs. Constraints: Prioritize reliability.
- Data Governance Compliance Check - Role: You are a data analytics assistant. Task: Conduct a governance and compliance check for analyses on dataset [dataset_name], ensuring policy adherence and auditability. Context: Regulations [regulations], domain [domain], data_quality [data_quality]. Output: JSON with compliance findings and remediation plan. Constraints: Highlight risk areas.
- Time Window Aggregations - Role: You are a data analytics assistant. Task: Propose time window aggregation strategies for dataset [dataset_name] to capture rolling metrics for [metric]. Context: Time column [time_column], domain [domain]. Output: JSON with window definitions and expected insights. Constraints: Include performance notes.
- Cohort Analysis Prompt - Role: You are a data analytics assistant. Task: Design a cohort analysis plan for dataset [dataset_name], grouping by [cohort_definition], and measuring outcomes over time. Context: Rows [rows], domain [domain], data_quality [data_quality]. Output: JSON with cohort definitions, metrics, and interpretation notes. Constraints: Ensure enough observations per cohort.
- ROI and Impact Analysis - Role: You are a data analytics assistant. Task: Estimate ROI and impact of a program using dataset [dataset_name], comparing costs and outcomes across cohorts. Context: Domain [domain], data_quality [data_quality]. Output: JSON with ROI calculation and sensitivity checks. Constraints: Include caveats.
- Scenario Analysis and What-If Prompts - Role: You are a data analytics assistant. Task: Run scenario and what-if analyses on dataset [dataset_name], evaluating outcomes under different assumptions for [factors]. Context: Domain [domain], data_quality [data_quality]. Output: JSON with scenario results and implications. Constraints: Present at least 3 scenarios.
- Error Budgeting for Analytics - Role: You are a data analytics assistant. Task: Propose an error budgeting framework for analytics on dataset [dataset_name], balancing speed vs accuracy and outlining SLOs. Context: Domain [domain], data_quality [data_quality]. Output: JSON with budget allocations and monitoring plan. Constraints: Align with stakeholder expectations.
- Data Dictionary Generation - Role: You are a data analytics assistant. Task: Generate a data dictionary for dataset [dataset_name], describing each column, data type, permissible values, and meaning. Context: Domain [domain], audience [audience]. Output: JSON with dictionary entries and sample queries.
- Data Visualization Storyboard - Role: You are a data analytics assistant. Task: Create a storyboard for a data visualization narrative using dataset [dataset_name], outlining visuals, captions, and flow. Context: Audience [audience], domain [domain]. Output: JSON with storyboard frames and rationale. Constraints: Ensure accessibility.
- Pivot Table Automation Prompt - Role: You are a data analytics assistant. Task: Generate a set of pivot-table configurations for dataset [dataset_name], covering common business questions and ensuring reproducibility. Context: Columns [columns], domain [domain]. Output: JSON with pivot definitions and example outputs. Constraints: Include at least 4 configurations.
- Missing Data Imputation Quality Check - Role: You are a data analytics assistant. Task: Evaluate the quality of missing data imputation for dataset [dataset_name], comparing methods and impact on downstream analyses. Context: Columns [columns], missing_values [missing_values_info], domain [domain]. Output: JSON with method comparisons and quality scores. Constraints: Provide visualization-friendly summaries.
- Batch Effect Assessment (Biased Data) - Role: You are a data analytics assistant. Task: Assess batch effects in dataset [dataset_name], reporting potential biases by batch and proposing mitigation steps. Context: Batch identifiers [batch_id], domain [domain], data_quality [data_quality]. Output: JSON with bias assessment and remediation plan. Constraints: Include visualization notes.
- Seasonal Adjustment and Deseasonalization - Role: You are a data analytics assistant. Task: Apply seasonal adjustment/deseasonalization to time-series data in dataset [dataset_name] for metric [metric], and compare forecasts with/without adjustment. Context: Time column [time_column], domain [domain], data_quality [data_quality]. Output: JSON with adjusted series and forecasting implications. Constraints: Document methods.
- Experiment Result Reproducibility - Role: You are a data analytics assistant. Task: Ensure reproducibility of experiment results on dataset [dataset_name], including seed management, data splits, and versioned artifacts. Context: Domain [domain], data_quality [data_quality]. Output: JSON with reproducibility checklist and sample artifacts. Constraints: Provide audit trails.
- Data-Driven Decision Memo - Role: You are a data analytics assistant. Task: Draft a data-driven decision memo from findings on dataset [dataset_name], focusing on recommended actions for stakeholders. Context: Audience [audience], domain [domain]. Output: JSON with memo sections and key metrics. Constraints: Keep it concise.
- Data Quality Benchmarking - Role: You are a data analytics assistant. Task: Benchmark data quality of dataset [dataset_name] against industry norms or internal standards, highlighting gaps and improvement priorities. Context: Domain [domain], data_quality_metrics [metrics]. Output: JSON with benchmark results and recommended actions. Constraints: Include actionable steps.
- Anonymization Impact Assessment - Role: You are a data analytics assistant. Task: Evaluate the impact of anonymization on analytics for dataset [dataset_name], including utility loss and privacy safeguards. Context: Columns [columns], policy [policy]. Output: JSON with impact assessment and mitigation notes. Constraints: Quantify utility impact.
- Multimodal Data Analysis Prompt - Role: You are a data analytics assistant. Task: Propose an analysis plan for a multimodal dataset [dataset_name] combining [modalities], detailing integration steps and insights. Context: Domain [domain], data_quality [data_quality]. Output: JSON with modality integration plan and expected outputs. Constraints: Include cross-modal validation.
- Experiment Suppression and Control Group Handling - Role: You are a data analytics assistant. Task: Plan handling of control vs suppressed groups in observational data in dataset [dataset_name], ensuring valid comparisons. Context: Groups [groups], domain [domain], data_quality [data_quality]. Output: JSON with suppression rules and analysis plan. Constraints: Document assumptions.
- Causal Inference Prompt - Role: You are a data analytics assistant. Task: Design a causal inference analysis for dataset [dataset_name], selecting methods (propensity score, instrumental variables) and validation checks. Context: Treatment [treatment], outcome [outcome], domain [domain]. Output: JSON with model specification and diagnostics. Constraints: Provide robustness checks.
- Robustness Check Across Subgroups - Role: You are a data analytics assistant. Task: Test robustness of findings across subgroups in dataset [dataset_name], reporting consistency or explainable differences. Context: Subgroups [subgroups], domain [domain]. Output: JSON with subgroup analyses and interpretation notes. Constraints: Include effect size consistency metrics.
- Data-Driven Risk Assessment - Role: You are a data analytics assistant. Task: Perform a risk assessment on dataset [dataset_name], identifying data and model risks with mitigation strategies. Context: Domain [domain], data_quality [data_quality]. Output: JSON with risk matrix and action plan. Constraints: Prioritize high-risk items.
- Analytics Report Formatting Standards - Role: You are a data analytics assistant. Task: Define a standardized analytics report template for dataset [dataset_name], including sections, visuals, and language style. Context: Audience [audience], domain [domain]. Output: JSON with template structure and example content. Constraints: Ensure consistency.
- API Data Integration Check - Role: You are a data analytics assistant. Task: Validate an API data integration for dataset [dataset_name], checking data freshness, schema alignment, and error handling. Context: API endpoints [endpoints], domain [domain]. Output: JSON with integration health and recommended improvements.
- Sparsity and Rare Event Handling - Role: You are a data analytics assistant. Task: Address sparsity in dataset [dataset_name], including rare event handling strategies for [columns]. Context: Domain [domain], data_quality [data_quality]. Output: JSON with techniques and expected impact on analyses.
- Benchmarking Against Industry Standards - Role: You are a data analytics assistant. Task: Benchmark your dataset [dataset_name] analytics against industry standards for [industry/sector], reporting gaps and improvement ideas. Context: Domain [domain], data_quality [data_quality]. Output: JSON with benchmark results and recommended actions.
- Final Data Analysis Summary and Recommendations - Role: You are a data analytics assistant. Task: Compile a final data analysis summary for dataset [dataset_name], synthesizing findings on [topic], and delivering actionable recommendations for decision-makers. Context: Audience [audience], domain [domain]. Output: JSON with executive summary, key metrics, and next-step plan. Constraints: Be concise and decision-focused.
Markdown Template
100 Best Gemini Prompts for Data Analysis
# 100 Best Gemini Prompts for Data Analysis
**EDA and Summary Statistics**: Role: You are a data analytics assistant using Google Gemini to perform Exploratory Data Analysis on dataset [dataset_name]. Task: Generate comprehensive summary statistics and initial data quality checks for [columns]. Context: Dataset has [rows] rows, domain is [domain], missing values [missing_values_info], data quality notes [data_quality]. Output: Return a structured JSON report with sections: overview, data_quality, summary_statistics, distribution_descriptions, and recommended next steps. Constraints: Provide reproducible parameters, specify which columns were analyzed, include a short visual-ideas list for potential plots, and use [format] format.
**Univariate Distribution Analysis**: Role: You are a data analytics assistant using Google Gemini to analyze univariate distributions. Task: For dataset [dataset_name], compute distribution metrics (mean, median, mode, variance, skewness, kurtosis) for numeric columns and report frequency for categorical ones. Context: Columns [columns], rows [rows], domain [domain]. Output: JSON with per-column distribution metrics and recommended transformations. Constraints: Highlight potential outliers and report data quality caveats.
**Bivariate Relationships and Correlation**: Role: You are a data analytics assistant using Google Gemini to explore relationships between pairs of features. Task: Identify strong or non-linear relationships between numeric columns in dataset [dataset_name], report correlation matrices, and suggest potential interaction terms. Context: Columns [columns], sample size [rows], analysis_goal [analysis_goal]. Output: A structured JSON with correlation matrix, scatter-pattern notes, and suggested next steps. Constraints: Include at least 3 strong relationships and note any non-linear patterns.
**Data Cleaning and Missing Values Strategy**: Role: You are a data analytics assistant. Task: Propose a missing data handling strategy for dataset [dataset_name], including imputation methods per column type, outlier handling plan, and data quality improvements. Context: Columns [columns], missing_values [missing_values_info], domain [domain], rows [rows]. Output: JSON report detailing chosen imputation methods, rationale, and a ready-to-run imputation blueprint. Constraints: Prioritize preserving data integrity and documenting assumptions.
**Outlier Detection and Treatment Plan**: Role: You are a data analytics assistant. Task: Detect and document outliers in dataset [dataset_name] across numeric columns [columns], classify as actionable or non-actionable, and propose treatment options. Context: [rows] rows, data quality notes [data_quality]. Output: Structured JSON including outlier counts by column, justification, and recommended handling (cap, transformation, or retention). Constraints: Include a heatmap-ready outline for visualization.
**Feature Engineering for Analysis**: Role: You are a data analytics assistant. Task: Propose and implement feature engineering steps for dataset [dataset_name] to improve analysis quality for [analysis_goal]. Context: Columns [columns], domain [domain], intended audience [audience], constraints [constraints]. Output: JSON with feature list, rationale, and example formulas. Constraints: Include at least 5 new features with brief explanations.
**Time Series Diagnostics and Decomposition**: Role: You are a data analytics assistant. Task: Run time-series diagnostics on dataset [dataset_name] with timestamp column [time_column], identifying trend, seasonality, and anomalies. Context: Data length [rows], frequency [frequency], domain [domain]. Output: JSON report with decomposition summary, key patterns, and suggested forecasting approach. Constraints: Provide plots metadata and computation notes.
**Seasonality and Trend Exploration**: Role: You are a data analytics assistant. Task: Analyze seasonality and trend behaviors in dataset [dataset_name] for metric [metric], across periods defined by [seasonality_definition]. Context: Time index [time_index], data quality [data_quality]. Output: JSON results including seasonal indices, trend strength, and recommendations for forecasting models. Constraints: Use robust seasonality tests and document assumptions.
**Anomaly Detection Setup**: Role: You are a data analytics assistant. Task: Set up an anomaly detection framework for dataset [dataset_name], specifying methods (statistical, isolation, or ML-based), thresholds, and monitoring cadence. Context: Columns [columns], period [period], domain [domain], data quality [data_quality]. Output: JSON blueprint with selected method, thresholds, and validation plan. Constraints: Provide fallback rules and alert criteria.
**Hypothesis Testing Plan**: Role: You are a data analytics assistant. Task: Design a hypothesis testing plan for dataset [dataset_name] to test [hypothesis], including test type, assumptions, sample size considerations, and interpretation guidelines. Context: Columns [columns], data distribution notes [notes], domain [domain]. Output: JSON with test choice justification, power analysis, and decision criteria. Constraints: Include multiple testing adjustments if applicable.
**GroupBy and Aggregation Lab**: Role: You are a data analytics assistant. Task: Use group-by operations on dataset [dataset_name] to compute aggregated metrics by [group_by_columns] and summarize results for decision-makers. Context: Rows [rows], domain [domain]. Output: JSON with group keys, aggregates, and a recommended next visualization. Constraints: Include at least 3 groups and 4 metrics.
**Categorical Encoding Decisions**: Role: You are a data analytics assistant. Task: Propose and implement encoding strategies for categorical features in dataset [dataset_name], balancing model performance and interpretability. Context: Categorical columns [columns], target variable [target], domain [domain]. Output: JSON detailing encoding methods, rationale, and sample encoded outputs. Constraints: Prefer target encoding or one-hot depending on sparsity.
**Pivot and Cross-Tab Report**: Role: You are a data analytics assistant. Task: Create pivot/cross-tab reports for dataset [dataset_name] across [pivot_dimensions] and [metrics], highlighting notable patterns and gaps. Context: Rows [rows], domain [domain]. Output: JSON with tables-ready data structures and narrative notes. Constraints: Include at least 3 pivot views.
**Data Quality Scoring**: Role: You are a data analytics assistant. Task: Develop a data quality scorecard for dataset [dataset_name], scoring completeness, consistency, accuracy, timeliness, and validity. Context: Columns [columns], rows [rows], domain [domain]. Output: JSON with scores by dimension, high-impact issues, and remediation plan. Constraints: Use a transparent scoring rubric.
**Effect Size and Power Analysis**: Role: You are a data analytics assistant. Task: Estimate effect sizes for [analysis_goal] in dataset [dataset_name], and perform power analysis to determine required sample size for future studies. Context: Metric [metric], groups [groups], data quality [data_quality]. Output: JSON with effect size estimates, confidence intervals, and recommended sample size. Constraints: Include assumptions and limitations.
**Bootstrap and Confidence Intervals**: Role: You are a data analytics assistant. Task: Apply bootstrap methods to estimate confidence intervals for key statistics in dataset [dataset_name], across [columns]. Context: Data quality [data_quality], sample size [rows]. Output: JSON with bootstrap estimates, percentile intervals, and interpretation notes. Constraints: Use 1000 bootstrap samples by default.
**Regression Diagnostics**: Role: You are a data analytics assistant. Task: Perform regression diagnostics for model on dataset [dataset_name], checking residuals, heteroscedasticity, and multicollinearity. Context: Features [features], target [target], data quality [data_quality]. Output: JSON with diagnostic results and recommended model refinements. Constraints: Include plots metadata and actionable steps.
**Classification Pipeline Prompt**: Role: You are a data analytics assistant. Task: Build a simple classification analysis pipeline for dataset [dataset_name], selecting features [features], evaluation metric [metric], and reporting interpretation guidance. Context: Data_quality [data_quality], domain [domain]. Output: JSON with model choices, cross-validation plan, and evaluation summary. Constraints: Prioritize interpretability.
**Clustering and Segment Discovery**: Role: You are a data analytics assistant. Task: Identify customer segments in dataset [dataset_name] using clustering on features [features], evaluate cluster validity, and provide actionable segment profiles. Context: Rows [rows], domain [domain], data quality [data_quality]. Output: JSON with cluster assignments, centroid descriptions, and recommended actions. Constraints: Include at least 3 clusters.
**Principal Component Analysis Prompt**: Role: You are a data analytics assistant. Task: Apply PCA to dataset [dataset_name] on features [features], determine explained variance, and provide a plan for dimensionality reduction. Context: Data quality [data_quality], domain [domain]. Output: JSON with components, loadings summary, and suggested use cases. Constraints: Recommend number of components to retain.
**Visualization Design Brief**: Role: You are a data analytics assistant. Task: Create a visualization design brief for dataset [dataset_name], mapping key metrics to visuals suitable for [audience], and noting accessibility requirements. Context: Data quality [data_quality], domain [domain]. Output: JSON with visualization recommendations, chart types, and legend notes. Constraints: Prioritize clarity and accessibility.
**Statistical Significance Reporting**: Role: You are a data analytics assistant. Task: Report statistical significance for findings in dataset [dataset_name], including p-values, confidence intervals, and practical significance notes. Context: Columns [columns], analysis_goal [analysis_goal]. Output: JSON with results and interpretation guidance. Constraints: Distinguish statistical vs practical significance.
**Bayesian Data Analysis Overview**: Role: You are a data analytics assistant. Task: Outline a Bayesian analysis plan for dataset [dataset_name], including priors, posterior inference approach, and decision rules. Context: Domain [domain], data quality [data_quality]. Output: JSON with model specification and justification. Constraints: Provide prior sensitivity notes.
**Forecasting Model Evaluation**: Role: You are a data analytics assistant. Task: Evaluate forecasting models for dataset [dataset_name], comparing [models], accuracy metrics, and calibration. Context: Time-series data [time_column], horizon [horizon], domain [domain]. Output: JSON with model rankings and recommended best model. Constraints: Include out-of-sample evaluation plan.
**ANOVA and Post-hoc Testing**: Role: You are a data analytics assistant. Task: Plan and execute ANOVA on dataset [dataset_name] for groups [groups], reporting effect sizes and post-hoc comparisons. Context: Data quality [data_quality], assumptions checked [assumptions]. Output: JSON with test results and interpretation. Constraints: Include multiple comparison adjustment.
**Correlation vs Causation Reasoning**: Role: You are a data analytics assistant. Task: Provide a structured reasoning note distinguishing correlation from causation for findings in dataset [dataset_name], with recommended counterfactual checks. Context: Columns [columns], domain [domain]. Output: JSON with reasoning steps and suggested follow-up analyses. Constraints: Avoid overclaiming causality.
**Data Reshaping and Tidying**: Role: You are a data analytics assistant. Task: Propose and implement data reshaping (wide↔long) for dataset [dataset_name] to support analyses on [analysis_goal]. Context: Columns [columns], rows [rows]. Output: JSON with reshaping plan and resulting schema. Constraints: Maintain data lineage.
**Data Provenance and Lineage**: Role: You are a data analytics assistant. Task: Document data provenance and lineage for dataset [dataset_name], including source, transformations, and versioning. Context: Data quality [data_quality], stakeholders [stakeholders]. Output: JSON with lineage graph and audit trail. Constraints: Ensure reproducibility.
**Experimentation and A/B Testing Setup**: Role: You are a data analytics assistant. Task: Design an A/B testing analysis plan for dataset [dataset_name], including sample size, randomization checks, and interpretation rules. Context: Domain [domain], metric [metric], data quality [data_quality]. Output: JSON with experimental design and analysis steps. Constraints: Predefine success criteria.
**Data Leakage Prevention Plan**: Role: You are a data analytics assistant. Task: Identify potential data leakage in dataset [dataset_name] analyses and propose prevention steps across data preparation, feature engineering, and evaluation. Context: Columns [columns], domain [domain]. Output: JSON with leakage risks and mitigations. Constraints: Document all assumptions.
**Model vs Baseline Comparison**: Role: You are a data analytics assistant. Task: Compare a chosen model for dataset [dataset_name] against a simple baseline (e.g., mean/median) across [metrics], and report practical significance. Context: Domain [domain], data_quality [data_quality]. Output: JSON with comparison results and recommendations. Constraints: Include confusion matrix or error metrics where relevant.
**Feature Importance Extraction**: Role: You are a data analytics assistant. Task: Extract and interpret feature importances for model on dataset [dataset_name], identifying top drivers of the target variable [target]. Context: Features [features], data_quality [data_quality]. Output: JSON with ranked features and rationale. Constraints: Include stability checks across folds.
**Residual Analysis and Diagnostics**: Role: You are a data analytics assistant. Task: Analyze residuals of a model applied to dataset [dataset_name], checking for patterns, heteroscedasticity, and potential model misspecification. Context: Model [model], target [target], data_quality [data_quality]. Output: JSON with diagnostic plots notes and remediation suggestions. Constraints: Provide actionable steps.
**Time-To-Event Analysis Prompt**: Role: You are a data analytics assistant. Task: Conduct time-to-event analysis on dataset [dataset_name], estimating survival functions and hazard rates for groups [groups]. Context: Time column [time_column], event indicator [event_column], domain [domain]. Output: JSON with survival estimates and interpretation notes. Constraints: Include censoring considerations.
**Survival Analysis Basics**: Role: You are a data analytics assistant. Task: Provide a basic survival analysis plan for dataset [dataset_name], including Kaplan-Meier estimates and log-rank test plan. Context: Time [time], event [event], group [group], domain [domain]. Output: JSON with plan and expected outputs. Constraints: Clear interpretation guidance.
**Text Data Analysis Prompt**: Role: You are a data analytics assistant. Task: Analyze text data in dataset [dataset_name] for topic and sentiment signals, detailing preprocessing steps and metrics. Context: Text column [text_column], domain [domain], language [lang]. Output: JSON with preprocessing outline, feature extraction method, and evaluation plan. Constraints: Include reproducibility notes.
**Sentiment and Topic Modeling Overview**: Role: You are a data analytics assistant. Task: Generate an analysis plan for sentiment and topic modeling on text data in dataset [dataset_name], including model choices and evaluation criteria. Context: Text [text_columns], domain [domain]. Output: JSON with model recommendations, preprocessing steps, and result interpretation notes. Constraints: Provide reproducible parameter settings.
**Geospatial Data Analysis Starter**: Role: You are a data analytics assistant. Task: Start a geospatial analysis for dataset [dataset_name], mapping metrics to geography using coordinates [lat, lon] or region labels [region]. Context: Domain [domain], data_quality [data_quality]. Output: JSON with spatial joins, heatmaps, and regional insights. Constraints: Include accuracy notes.
**Data Anonymization and Privacy Check**: Role: You are a data analytics assistant. Task: Review dataset [dataset_name] for privacy risks and propose anonymization steps, including re-identification risk assessment. Context: Columns [columns], regulatory context [regulation]. Output: JSON with anonymization plan and risk mitigation. Constraints: Preserve analytic utility.
**Powerful Data Visualization Prompts**: Role: You are a data analytics assistant. Task: Propose a set of 5 visualization prompts for dataset [dataset_name] that clearly communicates the main findings to executives and analysts. Context: Audience [audience], domain [domain], data_quality [data_quality]. Output: JSON with visualization concepts and rationale. Constraints: Prioritize clarity and actionability.
**Storytelling with Data Summary**: Role: You are a data analytics assistant. Task: Create a concise data story for dataset [dataset_name] focusing on [topic], summarizing insights for stakeholders. Context: Audience [audience], domain [domain]. Output: JSON with narrative outline, key figures, and recommended actions. Constraints: Include an executive takeaway.
**Executive Summary for Stakeholders**: Role: You are a data analytics assistant. Task: Produce an executive summary for dataset [dataset_name] addressing [analysis_goal], with succinct bullets and a one-page data brief. Context: Domain [domain], audience [audience]. Output: JSON with summary bullets and next-step recommendations. Constraints: Keep it to one page.
**Data-Driven KPI Dashboards**: Role: You are a data analytics assistant. Task: Outline KPIs and dashboard structure for dataset [dataset_name] to monitor [business goals], including data sources and refresh cadence. Context: Domain [domain], audience [audience]. Output: JSON with dashboard schema and data lineage notes. Constraints: Recommend at least 6 KPIs.
**Root Cause Analysis Prompt**: Role: You are a data analytics assistant. Task: Perform root cause analysis for a recurring issue in dataset [dataset_name], identifying drivers and correlations. Context: Issue [issue], data_quality [data_quality], domains [domains]. Output: JSON with causal factors, evidence summary, and recommended actions. Constraints: Avoid confirmation bias.
**Data-Driven Pricing Analysis**: Role: You are a data analytics assistant. Task: Conduct a pricing analysis on dataset [dataset_name], evaluating elasticity, break-even points, and recommended price bands. Context: Product [product], market [market], data_quality [data_quality]. Output: JSON with pricing scenarios and decision criteria. Constraints: Include sensitivity analyses.
**Churn Prediction Readiness Check**: Role: You are a data analytics assistant. Task: Assess readiness to build a churn prediction model on dataset [dataset_name], including data quality, feature availability, and baseline performance expectations. Context: Domain [domain], target [target], data_quality [data_quality]. Output: JSON with readiness score and gaps. Constraints: Prioritize actionable next steps.
**Customer Segmentation Prompt**: Role: You are a data analytics assistant. Task: Create a customer segmentation plan using dataset [dataset_name], describing segments by key attributes and potential marketing actions. Context: Columns [columns], domain [domain], data_quality [data_quality]. Output: JSON with segment definitions, profiles, and recommended actions. Constraints: Ensure each segment has at least 100 observations.
**Sales funnel Analysis**: Role: You are a data analytics assistant. Task: Analyze the sales funnel in dataset [dataset_name], computing conversion rates at each stage and identifying bottlenecks. Context: Stages [stages], domain [domain], data_quality [data_quality]. Output: JSON with funnel metrics and recommended optimizations. Constraints: Include confidence intervals.
**Market Basket Analysis Starter**: Role: You are a data analytics assistant. Task: Perform market basket analysis on dataset [dataset_name], computing association rules for items in [items], and propose merchandising implications. Context: Transactions [transactions], domain [domain], data_quality [data_quality]. Output: JSON with rules, support/confidence metrics, and actionable insights. Constraints: Keep rule count to at most 20.
**Quality Control Chart Analysis**: Role: You are a data analytics assistant. Task: Apply control chart analysis to dataset [dataset_name], identify when processes exceed control limits for metric [metric]. Context: Time [time], domain [domain], data_quality [data_quality]. Output: JSON with control chart signals, interpretation notes, and recommended actions. Constraints: Include upper/lower control limits.
**Time Series Forecasting Prompt**: Role: You are a data analytics assistant. Task: Develop a forecasting plan for dataset [dataset_name] on metric [metric], specifying horizon, method choices, and evaluation strategy. Context: Time column [time_column], data_quality [data_quality], domain [domain]. Output: JSON with forecast plan and evaluation criteria. Constraints: Include uncertainty quantification.
**Cross-Validation Strategy Setup**: Role: You are a data analytics assistant. Task: Propose a robust cross-validation strategy for a model trained on dataset [dataset_name], detailing folds, shuffling, and evaluation metrics. Context: Features [features], target [target], data_quality [data_quality]. Output: JSON with CV plan and rationale. Constraints: Adapt to time-series if needed.
**Hyperparameter Tuning for Analysis**: Role: You are a data analytics assistant. Task: Outline a hyperparameter tuning plan for a model or analysis pipeline on dataset [dataset_name], including search space, evaluation metric, and stopping criteria. Context: Domain [domain], data_quality [data_quality]. Output: JSON with tuning plan and expected resources. Constraints: Prefer light-weight searches.
**Model Calibration and Reliability Check**: Role: You are a data analytics assistant. Task: Assess calibration and reliability of a model on dataset [dataset_name], with calibration curves and reliability metrics. Context: Model [model], target [target], data_quality [data_quality]. Output: JSON with calibration results and adjustment recommendations. Constraints: Include plots metadata.
**NLP Data Cleaning and Tokenization**: Role: You are a data analytics assistant. Task: Clean and tokenize textual data in dataset [dataset_name] for NLP analysis, including normalization steps and tokenization strategy. Context: Text column [text_column], domain [domain], language [lang]. Output: JSON with preprocessing pipeline and example tokens. Constraints: Ensure reproducibility.
**Experiment Reproducibility Checklist**: Role: You are a data analytics assistant. Task: Build an experiment reproducibility checklist for analyses on dataset [dataset_name], including versioning, random seeds, and data lineage. Context: Domain [domain], data_quality [data_quality]. Output: JSON with checklist items and owner assignments. Constraints: Make it actionable.
**Data Pipeline Red Flags**: Role: You are a data analytics assistant. Task: Identify red flags in a data pipeline for dataset [dataset_name], including ingestion, transformation, and load steps. Context: Pipeline stages [stages], domain [domain], data_quality [data_quality]. Output: JSON with risk scores and remediation plans. Constraints: Prioritize critical failures.
**Dimensionality Reduction Selection**: Role: You are a data analytics assistant. Task: Recommend a dimensionality reduction method for dataset [dataset_name], given [data characteristics], and justify choice. Context: Columns [columns], domain [domain], data_quality [data_quality]. Output: JSON with method rationale and implementation notes. Constraints: Compare at least 2 methods.
**Data Drift Detection and Monitoring**: Role: You are a data analytics assistant. Task: Design a data drift detection plan for dataset [dataset_name], including features to monitor, thresholds, and alerting. Context: Data source [source], domain [domain], data_quality [data_quality]. Output: JSON with drift metrics and monitoring cadence. Constraints: Include rollback plan.
**Ethics and Bias Review for Analysis**: Role: You are a data analytics assistant. Task: Conduct an ethics and bias review of analyses on dataset [dataset_name], identifying potential biases and mitigation strategies. Context: Domain [domain], data_quality [data_quality]. Output: JSON with bias categories, impact assessment, and recommended mitigations. Constraints: Provide transparent justifications.
**Data Visualization Accessibility Check**: Role: You are a data analytics assistant. Task: Evaluate all proposed visuals for dataset [dataset_name] against accessibility guidelines (color contrast, screen-reader friendliness, alt text). Context: Visuals [visuals], domain [domain]. Output: JSON with accessibility ratings and remediation steps. Constraints: Include fallback options.
**Multicollinearity Assessment**: Role: You are a data analytics assistant. Task: Assess multicollinearity among numeric features in dataset [dataset_name], reporting VIFs and suggesting feature reduction. Context: Features [features], target [target], domain [domain]. Output: JSON with VIF table and recommended feature set. Constraints: Provide interpretation guidance.
**Outlier Robustness Check**: Role: You are a data analytics assistant. Task: Evaluate model performance under outlier-robust vs standard configurations on dataset [dataset_name], reporting changes in accuracy or error metrics. Context: Columns [columns], domain [domain], data_quality [data_quality]. Output: JSON with robustness comparison and recommendations. Constraints: Include visual summaries.
**Quantile Analysis Prompt**: Role: You are a data analytics assistant. Task: Perform quantile analysis on dataset [dataset_name] for numeric columns [columns], reporting percentiles and tail behavior. Context: Time [time], domain [domain], data_quality [data_quality]. Output: JSON with quantile summaries and interpretation notes. Constraints: Highlight extreme values.
**Join and Merge Strategy Evaluation**: Role: You are a data analytics assistant. Task: Evaluate join/merge strategies for combining datasets in [project], focusing on key collisions, data loss, and performance for dataset [dataset_name]. Context: Keys [keys], domain [domain]. Output: JSON with recommended join strategy and data integrity notes. Constraints: Include test scenarios.
**Data Transformation and Normalization**: Role: You are a data analytics assistant. Task: Propose data transformation and normalization steps for dataset [dataset_name] to ensure comparability across features [features]. Context: Domain [domain], data_quality [data_quality]. Output: JSON with transformation rules, scaling methods, and rationale. Constraints: Include edge-case handling.
**Imputation Strategy Comparison**: Role: You are a data analytics assistant. Task: Compare imputation strategies for missing values in dataset [dataset_name], evaluating impact on downstream analyses for columns [columns]. Context: Rows [rows], data_quality [data_quality]. Output: JSON with method comparisons and recommended approach. Constraints: Include sensitivity notes.
**Forecast Error Analysis**: Role: You are a data analytics assistant. Task: Analyze forecast errors for a model on dataset [dataset_name], identifying biases, variance, and error sources across horizons [horizon]. Context: Time_column [time_column], metric [metric], domain [domain]. Output: JSON with error diagnostics and improvement plan. Constraints: Include error decomposition.
**Residual vs Fitted Plots Prompt**: Role: You are a data analytics assistant. Task: Produce residual vs fitted diagnostics for model on dataset [dataset_name], highlighting potential misspecification. Context: Model [model], target [target]. Output: JSON with plot metadata and interpretation notes. Constraints: Provide actionable remediation suggestions.
**Model Interpretability Prompt**: Role: You are a data analytics assistant. Task: Generate interpretability notes for a model trained on dataset [dataset_name], including SHAP-like explanations or feature impact summaries. Context: Features [features], target [target], domain [domain]. Output: JSON with interpretation results and user-friendly explanations. Constraints: Prioritize clarity for non-technical stakeholders.
**Experiment Design for Observational Data**: Role: You are a data analytics assistant. Task: Design an observational study analysis plan for dataset [dataset_name], including confounding controls and causal inference notes. Context: Domain [domain], data_quality [data_quality]. Output: JSON with plan and limitations. Constraints: State assumptions.
**Power Analysis for Sample Size**: Role: You are a data analytics assistant. Task: Perform power analysis for a future study based on dataset [dataset_name], determining required sample size for detecting effect size [effect_size] with alpha [alpha] and power [power]. Context: Metric [metric], domain [domain]. Output: JSON with sample size recommendations and rationale. Constraints: Include scenario bounds.
**Data Export and Documentation Plan**: Role: You are a data analytics assistant. Task: Create a data export and documentation plan for findings from dataset [dataset_name], including formats, metadata, and versioning. Context: Audience [audience], domain [domain]. Output: JSON with export specs and documentation templates. Constraints: Ensure traceability.
**Automation for Repetitive Analyses**: Role: You are a data analytics assistant. Task: Propose automation steps to repeat analyses on dataset [dataset_name] across updates, including scheduling, parameterization, and logging. Context: Frequency [frequency], domain [domain], data_quality [data_quality]. Output: JSON with automation blueprint and sample run logs. Constraints: Prioritize reliability.
**Data Governance Compliance Check**: Role: You are a data analytics assistant. Task: Conduct a governance and compliance check for analyses on dataset [dataset_name], ensuring policy adherence and auditability. Context: Regulations [regulations], domain [domain], data_quality [data_quality]. Output: JSON with compliance findings and remediation plan. Constraints: Highlight risk areas.
**Time Window Aggregations**: Role: You are a data analytics assistant. Task: Propose time window aggregation strategies for dataset [dataset_name] to capture rolling metrics for [metric]. Context: Time column [time_column], domain [domain]. Output: JSON with window definitions and expected insights. Constraints: Include performance notes.
**Cohort Analysis Prompt**: Role: You are a data analytics assistant. Task: Design a cohort analysis plan for dataset [dataset_name], grouping by [cohort_definition], and measuring outcomes over time. Context: Rows [rows], domain [domain], data_quality [data_quality]. Output: JSON with cohort definitions, metrics, and interpretation notes. Constraints: Ensure enough observations per cohort.
**ROI and Impact Analysis**: Role: You are a data analytics assistant. Task: Estimate ROI and impact of a program using dataset [dataset_name], comparing costs and outcomes across cohorts. Context: Domain [domain], data_quality [data_quality]. Output: JSON with ROI calculation and sensitivity checks. Constraints: Include caveats.
**Scenario Analysis and What-If Prompts**: Role: You are a data analytics assistant. Task: Run scenario and what-if analyses on dataset [dataset_name], evaluating outcomes under different assumptions for [factors]. Context: Domain [domain], data_quality [data_quality]. Output: JSON with scenario results and implications. Constraints: Present at least 3 scenarios.
**Error Budgeting for Analytics**: Role: You are a data analytics assistant. Task: Propose an error budgeting framework for analytics on dataset [dataset_name], balancing speed vs accuracy and outlining SLOs. Context: Domain [domain], data_quality [data_quality]. Output: JSON with budget allocations and monitoring plan. Constraints: Align with stakeholder expectations.
**Data Dictionary Generation**: Role: You are a data analytics assistant. Task: Generate a data dictionary for dataset [dataset_name], describing each column, data type, permissible values, and meaning. Context: Domain [domain], audience [audience]. Output: JSON with dictionary entries and sample queries.
**Data Visualization Storyboard**: Role: You are a data analytics assistant. Task: Create a storyboard for a data visualization narrative using dataset [dataset_name], outlining visuals, captions, and flow. Context: Audience [audience], domain [domain]. Output: JSON with storyboard frames and rationale. Constraints: Ensure accessibility.
**Pivot Table Automation Prompt**: Role: You are a data analytics assistant. Task: Generate a set of pivot-table configurations for dataset [dataset_name], covering common business questions and ensuring reproducibility. Context: Columns [columns], domain [domain]. Output: JSON with pivot definitions and example outputs. Constraints: Include at least 4 configurations.
**Missing Data Imputation Quality Check**: Role: You are a data analytics assistant. Task: Evaluate the quality of missing data imputation for dataset [dataset_name], comparing methods and impact on downstream analyses. Context: Columns [columns], missing_values [missing_values_info], domain [domain]. Output: JSON with method comparisons and quality scores. Constraints: Provide visualization-friendly summaries.
**Batch Effect Assessment (Biased Data)**: Role: You are a data analytics assistant. Task: Assess batch effects in dataset [dataset_name], reporting potential biases by batch and proposing mitigation steps. Context: Batch identifiers [batch_id], domain [domain], data_quality [data_quality]. Output: JSON with bias assessment and remediation plan. Constraints: Include visualization notes.
**Seasonal Adjustment and Deseasonalization**: Role: You are a data analytics assistant. Task: Apply seasonal adjustment/deseasonalization to time-series data in dataset [dataset_name] for metric [metric], and compare forecasts with/without adjustment. Context: Time column [time_column], domain [domain], data_quality [data_quality]. Output: JSON with adjusted series and forecasting implications. Constraints: Document methods.
**Experiment Result Reproducibility**: Role: You are a data analytics assistant. Task: Ensure reproducibility of experiment results on dataset [dataset_name], including seed management, data splits, and versioned artifacts. Context: Domain [domain], data_quality [data_quality]. Output: JSON with reproducibility checklist and sample artifacts. Constraints: Provide audit trails.
**Data-Driven Decision Memo**: Role: You are a data analytics assistant. Task: Draft a data-driven decision memo from findings on dataset [dataset_name], focusing on recommended actions for stakeholders. Context: Audience [audience], domain [domain]. Output: JSON with memo sections and key metrics.
**Data Quality Benchmarking**: Role: You are a data analytics assistant. Task: Benchmark data quality of dataset [dataset_name] against industry norms or internal standards, highlighting gaps and improvement priorities. Context: Domain [domain], data_quality_metrics [metrics]. Output: JSON with benchmark results and recommended actions.
**Anonymization Impact Assessment**: Role: You are a data analytics assistant. Task: Evaluate the impact of anonymization on analytics for dataset [dataset_name], including utility loss and privacy safeguards. Context: Columns [columns], policy [policy]. Output: JSON with impact assessment and mitigation notes.
**Multimodal Data Analysis Prompt**: Role: You are a data analytics assistant. Task: Propose an analysis plan for a multimodal dataset [dataset_name] combining [modalities], detailing integration steps and insights. Context: Domain [domain], data_quality [data_quality]. Output: JSON with modality integration plan and expected outputs.
**Experiment Suppression and Control Group Handling**: Role: You are a data analytics assistant. Task: Plan handling of control vs suppressed groups in observational data in dataset [dataset_name], ensuring valid comparisons. Context: Groups [groups], domain [domain], data_quality [data_quality]. Output: JSON with suppression rules and analysis plan.
**Causal Inference Prompt**: Role: You are a data analytics assistant. Task: Design a causal inference analysis for dataset [dataset_name], selecting methods (propensity score, instrumental variables) and validation checks. Context: Treatment [treatment], outcome [outcome], domain [domain]. Output: JSON with model specification and diagnostics.
**Robustness Check Across Subgroups**: Role: You are a data analytics assistant. Task: Test robustness of findings across subgroups in dataset [dataset_name], reporting consistency or explainable differences. Context: Subgroups [subgroups], domain [domain]. Output: JSON with subgroup analyses and interpretation notes.
**Data-Driven Risk Assessment**: Role: You are a data analytics assistant. Task: Perform a risk assessment on dataset [dataset_name], identifying data and model risks with mitigation strategies. Context: Domain [domain], data_quality [data_quality]. Output: JSON with risk matrix and action plan.
**Analytics Report Formatting Standards**: Role: You are a data analytics assistant. Task: Define a standardized analytics report template for dataset [dataset_name], including sections, visuals, and language style. Context: Audience [audience], domain [domain]. Output: JSON with template structure and example content. Constraints: Ensure consistency.
**API Data Integration Check**: Role: You are a data analytics assistant. Task: Validate an API data integration for dataset [dataset_name], checking data freshness, schema alignment, and error handling. Context: API endpoints [endpoints], domain [domain]. Output: JSON with integration health and recommended improvements.
**Sparsity and Rare Event Handling**: Role: You are a data analytics assistant. Task: Address sparsity in dataset [dataset_name], including rare event handling strategies for [columns]. Context: Domain [domain], data_quality [data_quality]. Output: JSON with techniques and expected impact on analyses.
**Benchmarking Against Industry Standards**: Role: You are a data analytics assistant. Task: Benchmark your dataset [dataset_name] analytics against industry standards for [industry/sector], reporting gaps and improvement ideas. Context: Domain [domain], data_quality [data_quality]. Output: JSON with benchmark results and recommended actions.
**Final Data Analysis Summary and Recommendations**: Role: You are a data analytics assistant. Task: Compile a final data analysis summary for dataset [dataset_name], synthesizing findings on [topic], and delivering actionable recommendations for decision-makers. Context: Audience [audience], domain [domain]. Output: JSON with executive summary, key metrics, and next-step plan. Constraints: Be concise and decision-focused.Best Practices
- Keep prompts concrete and task-focused to maximize reproducibility. - Use placeholders to adapt to any dataset. - Maintain consistent output formats to ease downstream automation. - Test prompts on a sample dataset before scaling. - Document assumptions and data quality notes with every prompt.
Common Mistakes to Avoid
- Overgeneralizing results from small samples. - Omitting data provenance and versioning in analyses. - Using vague prompts that force readers to supply missing context. - Ignoring measurement units, data types, and domain constraints. - Assuming causality from correlations without proper inference.
Related resources
Use these related resources to connect this Gemini prompt library with practical AI workflows, implementation examples, blog analysis, and business use cases.
- AI Prompts Library
- Gemini Prompts
- AI workflow simulator
- Cross-model AI workflows with Gemini
- Excel analysis Gemini prompts
- Business intelligence ChatGPT prompts
- Why analytics products need metric definitions
- QuickBooks reports and management dashboards
- Xero reports and business performance insights
FAQ
How do I adapt these prompts to a new dataset?
Replace placeholders like [dataset_name], [columns], [rows], [domain], and [target] with your dataset-specific values, and adjust analysis goals accordingly.
What output formats are recommended?
JSON is preferred for machine readability and reproducibility, with optional Markdown or CSV exports for humans.
Can I reuse prompts across different projects?
Yes. Many prompts are generic analytic templates; customize only the placeholders and the audience for reusability.
How should I validate Gemini outputs?
Cross-check Gemini results with your own analyses, reproducibility logs, and external benchmarks where possible.
Are these prompts suitable for large datasets?
Yes, but tailor methods to scale (e.g., sampling, incremental processing, and streaming-friendly approaches) and document assumptions.