#analytics
16 snippets tagged with #analytics
Window Functions with RANK and LAG
Use window functions to rank rows, calculate running totals, and compare with previous rows without self-joins.
Best for: Business reporting
NTILE and Percentile Window Functions
Distribute rows into buckets and compute percentiles with window functions.
Best for: Customer segmentation
Window Frame ROWS vs RANGE Clauses
Control exactly which rows a window function considers using frame specifications.
Best for: Moving averages
SQL Window Functions for Analytics
Advanced SQL window functions for running totals, rankings, moving averages, and gap analysis.
Best for: Building analytics dashboards with running totals
SQL Window Functions for Analytics
Use window functions for running totals, rankings, moving averages, and gap detection in analytics.
Best for: Building cumulative revenue dashboards
SQL Running Totals and Cumulative Metrics
Calculate running totals, cumulative counts, and percent-of-total using window functions and partitions.
Best for: Building cumulative revenue dashboards
DuckDB — Query Parquet Files with Python
Use DuckDB to query Parquet files and CSVs directly from Python without loading into memory first.
Best for: Ad-hoc analytics on Parquet files without Spark
PySpark Window Functions
Use PySpark window functions for running totals, rank, lag/lead, and percentile computations.
Best for: sales analytics
Statistical Analysis with SciPy
Run hypothesis tests, correlations, and descriptive statistics on dataset columns with SciPy.
Best for: A/B testing
OLS Regression with statsmodels
Fit and interpret an Ordinary Least Squares regression model with diagnostics using statsmodels.
Best for: econometric analysis
DuckDB In-Memory Analytics
Run fast analytical SQL on pandas DataFrames or Parquet files without a server using DuckDB.
Best for: serverless analytics
Pandas Cross-Tabulation (crosstab)
Compute frequency and proportion cross-tabulations between two categorical columns.
Best for: categorical analysis
Pareto / Cumulative Share Analysis
Calculate cumulative share (Pareto 80/20) of values for product or customer ranking analysis.
Best for: product analytics
Pandas Rank with Tie-Breaking Methods
Apply different ranking strategies (min, dense, average) and handle ties in pandas.
Best for: leaderboards
Pandas nlargest / nsmallest
Efficiently retrieve the N largest or smallest rows without sorting the full DataFrame.
Best for: top-N queries
Value Counts with Normalisation
Compute frequency distributions and percentage breakdowns of categorical columns.
Best for: data profiling