#pandas
64 snippets tagged with #pandas
Pandas DataFrame Transformations
Common pandas DataFrame transformations including column operations, type casting, and string methods.
Best for: Cleaning raw data files for analysis
Pandas DataFrame Filtering Techniques
Filter DataFrames using boolean masks, query syntax, isin, between, and string matching methods.
Best for: Extracting subsets of data for reporting
Pandas GroupBy Aggregation Examples
GroupBy operations with multiple aggregations, named aggregations, and transform for DataFrame analysis.
Best for: Sales reporting by region and time period
Pandas Merge and Join Examples
Combine DataFrames using merge, join, and concat with different join types and key handling.
Best for: Combining data from multiple sources
Parquet File Read and Write in Python
Read and write Parquet files with pandas and PyArrow including partitioning and schema control.
Best for: Efficient columnar storage for analytics data
Pandas Pivot and Unpivot Reshaping
Reshape DataFrames between wide and long formats using pivot, melt, and stack operations.
Best for: Reshaping data for reporting dashboards
Pandas Time Series Analysis
Time series operations with resampling, rolling windows, date offsets, and period conversions.
Best for: Sales trend analysis with moving averages
Pandas Null Handling Strategies
Comprehensive strategies for detecting, filling, and handling missing values in pandas DataFrames.
Best for: Cleaning datasets with missing values
Read Large CSV in Chunks with Pandas
Process CSV files larger than RAM by reading in chunks — memory-efficient ETL pattern for data pipelines.
Best for: Processing multi-GB CSV files without running out of memory
Dask Parallel DataFrame Processing
Process datasets larger than RAM using Dask's parallel, lazy DataFrame API.
Best for: out-of-core processing
Pandas Vectorised Operations vs Apply
Compare apply vs vectorised pandas operations for performance-critical column transformations.
Best for: feature engineering
Pandas Rolling & Expanding Windows
Compute moving averages, rolling sums, and cumulative stats on time-series data with pandas.
Best for: sales forecasting
Pandas MultiIndex Stack & Unstack
Work with hierarchical MultiIndex DataFrames: pivoting with stack/unstack and cross-sectional slicing.
Best for: panel data
Pandas Memory Reduction via Dtypes
Reduce DataFrame memory by 60-80% by downcasting numeric types and using categorical columns.
Best for: large dataset loading
SQLite + Pandas Local Data Pipeline
Run a lightweight local ETL with SQLite and pandas: load CSV, transform, persist to SQLite.
Best for: local analytics
Pandas Categorical Encoding for ML
One-hot encode, label encode, and ordinal encode categorical columns using pandas and scikit-learn.
Best for: ML preprocessing
Pandas String Operations
Clean, extract, and transform string columns using pandas .str accessor methods.
Best for: data cleaning
Flatten Nested JSON with pandas
Use pd.json_normalize to flatten deeply nested API responses into a flat DataFrame.
Best for: API response flattening
Pandas Method Chaining with .pipe()
Use the .pipe() method to create clean, readable pandas transformation chains.
Best for: clean ETL code
Pandas Explode List Column
Explode a column containing lists into separate rows, useful for normalising one-to-many relations.
Best for: array column expansion
Stream Large SQL Query in Chunks
Read millions of rows from SQL in memory-safe chunks using pandas read_sql with chunksize.
Best for: large table extraction
Pandas Time-Series Resampling
Resample time-series data from daily to weekly/monthly frequencies with aggregation functions.
Best for: time-series analytics
Concat & Deduplicate DataFrames
Merge multiple DataFrames and remove duplicates by composite key for clean data consolidation.
Best for: data consolidation
Pandas .query() for Readable Filters
Use DataFrame.query() with expressions for cleaner, SQL-like row filtering syntax.
Best for: data filtering
Pandas .eval() for Fast Column Computation
Use DataFrame.eval() for expressive, fast in-place column calculations using numexpr.
Best for: large DataFrame operations
Pandas merge_asof for Time-Based Joins
Perform an as-of join to match events to the most recent reference record within a time window.
Best for: tick data joins
Pandas Category Dtype Optimization
Convert string columns to categorical dtype to dramatically reduce memory and speed up groupby.
Best for: memory optimization
Pandas Wide to Long (melt)
Transform a wide-format DataFrame into long format using pd.melt for analytics and visualisation.
Best for: pivot table conversion
Pandas Custom Aggregation Functions
Pass custom lambda and named functions to .agg() for complex groupby aggregations.
Best for: HR analytics
Pandas Styled DataFrame Report
Apply conditional formatting to a pandas DataFrame for styled HTML reports with highlighting.
Best for: executive reporting
Pandas .assign() for Immutable Chaining
Use DataFrame.assign() to add computed columns without mutating the original DataFrame.
Best for: immutable transforms
Pandas IntervalIndex for Binning
Use IntervalIndex and pd.cut to bin continuous variables into labelled categories.
Best for: grading systems
Grouped Time-Series with ffill
Forward-fill missing time-series values within groups to handle irregular measurement intervals.
Best for: IoT sensor data
Pandas Pivot Table Summary
Create multi-level summary pivot tables from transactional data using pd.pivot_table.
Best for: sales reporting
Pandas Named Aggregations
Use named aggregations in groupby().agg() to produce readable, self-documenting summary tables.
Best for: HR reporting
Read Multi-Sheet Excel Files
Load, merge, and process data from multiple Excel sheets using pandas ExcelFile context manager.
Best for: Excel ETL
Stratified Sampling with pandas
Draw a stratified random sample from a DataFrame, preserving class proportions for ML splits.
Best for: ML dataset splitting
Custom Pandas Accessor Extension
Create a reusable @pd.api.extensions.register_dataframe_accessor for domain-specific DataFrame methods.
Best for: domain-specific pandas
Timezone-Aware Timestamps in pandas
Convert naive timestamps to timezone-aware, handle DST transitions, and localise to UTC.
Best for: global event logs
Pandas Cross-Tabulation (crosstab)
Compute frequency and proportion cross-tabulations between two categorical columns.
Best for: categorical analysis
Pandas read_csv with Explicit Dtypes
Specify column dtypes on CSV read to avoid costly inference and prevent silent type coercion.
Best for: fast CSV loading
Pareto / Cumulative Share Analysis
Calculate cumulative share (Pareto 80/20) of values for product or customer ranking analysis.
Best for: product analytics
Pandas Apply with Chunked Progress
Apply a function to a large DataFrame in chunked batches to avoid memory spikes and track progress.
Best for: memory-safe transforms
Pandas Merge with Validation
Use merge() validate parameter to catch unexpected many-to-many or missing key issues in joins.
Best for: data integrity
Pandas Conditional Join with merge + query
Perform range/conditional joins by merging on a common key and filtering with query expressions.
Best for: session attribution
Pandas GroupBy Transform Patterns
Use groupby().transform() to compute group-level statistics and broadcast them back to row level.
Best for: feature engineering
Pandas Business Day Offsets
Compute business-day-adjusted dates using pandas offsets for financial and SLA calculations.
Best for: financial calendars
Detect & Remove Pandas Duplicates
Find, count, and remove duplicate rows with flexible keep strategy and composite key support.
Best for: data cleaning
Efficient One-Hot Pivot with Sparse
Create a sparse user-item matrix from transaction logs for recommendation or ML use cases.
Best for: recommendation systems
Pandas PeriodIndex for Fiscal Calendars
Use PeriodIndex for fiscal period arithmetic, aggregations, and comparisons beyond datetime.
Best for: fiscal reporting
Compare Two DataFrames for Changes
Detect row-level additions, deletions, and modifications between two DataFrame snapshots.
Best for: change data capture
Cumulative Max & Streak Detection
Detect streaks, new highs, and consecutive-day patterns in time-series using cummax and groupby.
Best for: sports analytics
Read NDJSON / JSON Lines Files
Efficiently read newline-delimited JSON (NDJSON) log files into a pandas DataFrame.
Best for: log file ingestion
Detect Overlapping Date Intervals
Identify overlapping time periods in a DataFrame (e.g., booking conflicts or subscription overlaps).
Best for: scheduling conflicts
Pandas SwapLevel MultiIndex
Swap and sort MultiIndex levels in a hierarchical DataFrame for flexible aggregation.
Best for: hierarchical reporting
Pandas Rank with Tie-Breaking Methods
Apply different ranking strategies (min, dense, average) and handle ties in pandas.
Best for: leaderboards
Pandas nlargest / nsmallest
Efficiently retrieve the N largest or smallest rows without sorting the full DataFrame.
Best for: top-N queries
Pandas Datetime Component Extraction
Extract year, month, day, hour, day-of-week and other components from a datetime column.
Best for: time-based features
DataFrame to Dict Records
Convert DataFrames to lists of dicts for API responses, JSON export, or further processing.
Best for: API serialization
Pandas Cartesian Feature Interaction
Generate pairwise feature interactions for ML by creating cross-product columns.
Best for: ML feature engineering
Pandas Rolling Correlation
Compute rolling Pearson correlation between two columns to detect shifting relationships over time.
Best for: regime detection
Expand JSON Column into DataFrame Columns
Parse a JSON-string column and expand its keys into separate columns in one step.
Best for: JSON column expansion
Pandas Forward Fill & Backward Fill
Propagate non-null values forward and backward to fill gaps in time-series or sparse data.
Best for: gap filling
Value Counts with Normalisation
Compute frequency distributions and percentage breakdowns of categorical columns.
Best for: data profiling