#pandas

64 snippets tagged with #pandas

pythonbeginner

Pandas DataFrame Transformations

Common pandas DataFrame transformations including column operations, type casting, and string methods.

Best for: Cleaning raw data files for analysis

#pandas#dataframe
pythonbeginner

Pandas DataFrame Filtering Techniques

Filter DataFrames using boolean masks, query syntax, isin, between, and string matching methods.

Best for: Extracting subsets of data for reporting

#pandas#filtering
pythonintermediate

Pandas GroupBy Aggregation Examples

GroupBy operations with multiple aggregations, named aggregations, and transform for DataFrame analysis.

Best for: Sales reporting by region and time period

#pandas#groupby
pythonintermediate

Pandas Merge and Join Examples

Combine DataFrames using merge, join, and concat with different join types and key handling.

Best for: Combining data from multiple sources

#pandas#merge
pythonbeginner

Parquet File Read and Write in Python

Read and write Parquet files with pandas and PyArrow including partitioning and schema control.

Best for: Efficient columnar storage for analytics data

#parquet#pyarrow
pythonintermediate

Pandas Pivot and Unpivot Reshaping

Reshape DataFrames between wide and long formats using pivot, melt, and stack operations.

Best for: Reshaping data for reporting dashboards

#pandas#pivot
pythonintermediate

Pandas Time Series Analysis

Time series operations with resampling, rolling windows, date offsets, and period conversions.

Best for: Sales trend analysis with moving averages

#pandas#time-series
pythonbeginner

Pandas Null Handling Strategies

Comprehensive strategies for detecting, filling, and handling missing values in pandas DataFrames.

Best for: Cleaning datasets with missing values

#pandas#null
pythonintermediate

Read Large CSV in Chunks with Pandas

Process CSV files larger than RAM by reading in chunks — memory-efficient ETL pattern for data pipelines.

Best for: Processing multi-GB CSV files without running out of memory

#pandas#csv
pythonintermediate

Dask Parallel DataFrame Processing

Process datasets larger than RAM using Dask's parallel, lazy DataFrame API.

Best for: out-of-core processing

#dask#parallel
pythonintermediate

Pandas Vectorised Operations vs Apply

Compare apply vs vectorised pandas operations for performance-critical column transformations.

Best for: feature engineering

#pandas#vectorization
pythonintermediate

Pandas Rolling & Expanding Windows

Compute moving averages, rolling sums, and cumulative stats on time-series data with pandas.

Best for: sales forecasting

#pandas#time-series
pythonintermediate

Pandas MultiIndex Stack & Unstack

Work with hierarchical MultiIndex DataFrames: pivoting with stack/unstack and cross-sectional slicing.

Best for: panel data

#pandas#multiindex
pythonintermediate

Pandas Memory Reduction via Dtypes

Reduce DataFrame memory by 60-80% by downcasting numeric types and using categorical columns.

Best for: large dataset loading

#pandas#memory
pythonbeginner

SQLite + Pandas Local Data Pipeline

Run a lightweight local ETL with SQLite and pandas: load CSV, transform, persist to SQLite.

Best for: local analytics

#sqlite#pandas
pythonbeginner

Pandas Categorical Encoding for ML

One-hot encode, label encode, and ordinal encode categorical columns using pandas and scikit-learn.

Best for: ML preprocessing

#pandas#encoding
pythonbeginner

Pandas String Operations

Clean, extract, and transform string columns using pandas .str accessor methods.

Best for: data cleaning

#pandas#strings
pythonbeginner

Flatten Nested JSON with pandas

Use pd.json_normalize to flatten deeply nested API responses into a flat DataFrame.

Best for: API response flattening

#pandas#json
pythonintermediate

Pandas Method Chaining with .pipe()

Use the .pipe() method to create clean, readable pandas transformation chains.

Best for: clean ETL code

#pandas#pipe
pythonbeginner

Pandas Explode List Column

Explode a column containing lists into separate rows, useful for normalising one-to-many relations.

Best for: array column expansion

#pandas#explode
pythonintermediate

Stream Large SQL Query in Chunks

Read millions of rows from SQL in memory-safe chunks using pandas read_sql with chunksize.

Best for: large table extraction

#pandas#sql
pythonbeginner

Pandas Time-Series Resampling

Resample time-series data from daily to weekly/monthly frequencies with aggregation functions.

Best for: time-series analytics

#pandas#time-series
pythonbeginner

Concat & Deduplicate DataFrames

Merge multiple DataFrames and remove duplicates by composite key for clean data consolidation.

Best for: data consolidation

#pandas#deduplication
pythonbeginner

Pandas .query() for Readable Filters

Use DataFrame.query() with expressions for cleaner, SQL-like row filtering syntax.

Best for: data filtering

#pandas#query
pythonintermediate

Pandas .eval() for Fast Column Computation

Use DataFrame.eval() for expressive, fast in-place column calculations using numexpr.

Best for: large DataFrame operations

#pandas#eval
pythonintermediate

Pandas merge_asof for Time-Based Joins

Perform an as-of join to match events to the most recent reference record within a time window.

Best for: tick data joins

#pandas#merge-asof
pythonbeginner

Pandas Category Dtype Optimization

Convert string columns to categorical dtype to dramatically reduce memory and speed up groupby.

Best for: memory optimization

#pandas#category
pythonbeginner

Pandas Wide to Long (melt)

Transform a wide-format DataFrame into long format using pd.melt for analytics and visualisation.

Best for: pivot table conversion

#pandas#melt
pythonintermediate

Pandas Custom Aggregation Functions

Pass custom lambda and named functions to .agg() for complex groupby aggregations.

Best for: HR analytics

#pandas#groupby
pythonbeginner

Pandas Styled DataFrame Report

Apply conditional formatting to a pandas DataFrame for styled HTML reports with highlighting.

Best for: executive reporting

#pandas#styling
pythonbeginner

Pandas .assign() for Immutable Chaining

Use DataFrame.assign() to add computed columns without mutating the original DataFrame.

Best for: immutable transforms

#pandas#assign
pythonintermediate

Pandas IntervalIndex for Binning

Use IntervalIndex and pd.cut to bin continuous variables into labelled categories.

Best for: grading systems

#pandas#binning
pythonintermediate

Grouped Time-Series with ffill

Forward-fill missing time-series values within groups to handle irregular measurement intervals.

Best for: IoT sensor data

#pandas#ffill
pythonbeginner

Pandas Pivot Table Summary

Create multi-level summary pivot tables from transactional data using pd.pivot_table.

Best for: sales reporting

#pandas#pivot-table
pythonbeginner

Pandas Named Aggregations

Use named aggregations in groupby().agg() to produce readable, self-documenting summary tables.

Best for: HR reporting

#pandas#groupby
pythonbeginner

Read Multi-Sheet Excel Files

Load, merge, and process data from multiple Excel sheets using pandas ExcelFile context manager.

Best for: Excel ETL

#pandas#excel
pythonintermediate

Stratified Sampling with pandas

Draw a stratified random sample from a DataFrame, preserving class proportions for ML splits.

Best for: ML dataset splitting

#pandas#sampling
pythonadvanced

Custom Pandas Accessor Extension

Create a reusable @pd.api.extensions.register_dataframe_accessor for domain-specific DataFrame methods.

Best for: domain-specific pandas

#pandas#extension
pythonintermediate

Timezone-Aware Timestamps in pandas

Convert naive timestamps to timezone-aware, handle DST transitions, and localise to UTC.

Best for: global event logs

#pandas#datetime
pythonbeginner

Pandas Cross-Tabulation (crosstab)

Compute frequency and proportion cross-tabulations between two categorical columns.

Best for: categorical analysis

#pandas#crosstab
pythonbeginner

Pandas read_csv with Explicit Dtypes

Specify column dtypes on CSV read to avoid costly inference and prevent silent type coercion.

Best for: fast CSV loading

#pandas#csv
pythonintermediate

Pareto / Cumulative Share Analysis

Calculate cumulative share (Pareto 80/20) of values for product or customer ranking analysis.

Best for: product analytics

#pandas#pareto
pythonintermediate

Pandas Apply with Chunked Progress

Apply a function to a large DataFrame in chunked batches to avoid memory spikes and track progress.

Best for: memory-safe transforms

#pandas#apply
pythonintermediate

Pandas Merge with Validation

Use merge() validate parameter to catch unexpected many-to-many or missing key issues in joins.

Best for: data integrity

#pandas#merge
pythonintermediate

Pandas Conditional Join with merge + query

Perform range/conditional joins by merging on a common key and filtering with query expressions.

Best for: session attribution

#pandas#conditional-join
pythonintermediate

Pandas GroupBy Transform Patterns

Use groupby().transform() to compute group-level statistics and broadcast them back to row level.

Best for: feature engineering

#pandas#groupby
pythonintermediate

Pandas Business Day Offsets

Compute business-day-adjusted dates using pandas offsets for financial and SLA calculations.

Best for: financial calendars

#pandas#datetime
pythonbeginner

Detect & Remove Pandas Duplicates

Find, count, and remove duplicate rows with flexible keep strategy and composite key support.

Best for: data cleaning

#pandas#duplicates
pythonadvanced

Efficient One-Hot Pivot with Sparse

Create a sparse user-item matrix from transaction logs for recommendation or ML use cases.

Best for: recommendation systems

#pandas#sparse
pythonintermediate

Pandas PeriodIndex for Fiscal Calendars

Use PeriodIndex for fiscal period arithmetic, aggregations, and comparisons beyond datetime.

Best for: fiscal reporting

#pandas#period
pythonintermediate

Compare Two DataFrames for Changes

Detect row-level additions, deletions, and modifications between two DataFrame snapshots.

Best for: change data capture

#pandas#diff
pythonintermediate

Cumulative Max & Streak Detection

Detect streaks, new highs, and consecutive-day patterns in time-series using cummax and groupby.

Best for: sports analytics

#pandas#cummax
pythonbeginner

Read NDJSON / JSON Lines Files

Efficiently read newline-delimited JSON (NDJSON) log files into a pandas DataFrame.

Best for: log file ingestion

#pandas#ndjson
pythonadvanced

Detect Overlapping Date Intervals

Identify overlapping time periods in a DataFrame (e.g., booking conflicts or subscription overlaps).

Best for: scheduling conflicts

#pandas#intervals
pythonintermediate

Pandas SwapLevel MultiIndex

Swap and sort MultiIndex levels in a hierarchical DataFrame for flexible aggregation.

Best for: hierarchical reporting

#pandas#multiindex
pythonbeginner

Pandas Rank with Tie-Breaking Methods

Apply different ranking strategies (min, dense, average) and handle ties in pandas.

Best for: leaderboards

#pandas#ranking
pythonbeginner

Pandas nlargest / nsmallest

Efficiently retrieve the N largest or smallest rows without sorting the full DataFrame.

Best for: top-N queries

#pandas#top-n
pythonbeginner

Pandas Datetime Component Extraction

Extract year, month, day, hour, day-of-week and other components from a datetime column.

Best for: time-based features

#pandas#datetime
pythonbeginner

DataFrame to Dict Records

Convert DataFrames to lists of dicts for API responses, JSON export, or further processing.

Best for: API serialization

#pandas#records
pythonintermediate

Pandas Cartesian Feature Interaction

Generate pairwise feature interactions for ML by creating cross-product columns.

Best for: ML feature engineering

#pandas#feature-engineering
pythonintermediate

Pandas Rolling Correlation

Compute rolling Pearson correlation between two columns to detect shifting relationships over time.

Best for: regime detection

#pandas#rolling
pythonbeginner

Expand JSON Column into DataFrame Columns

Parse a JSON-string column and expand its keys into separate columns in one step.

Best for: JSON column expansion

#pandas#json
pythonbeginner

Pandas Forward Fill & Backward Fill

Propagate non-null values forward and backward to fill gaps in time-series or sparse data.

Best for: gap filling

#pandas#ffill
pythonbeginner

Value Counts with Normalisation

Compute frequency distributions and percentage breakdowns of categorical columns.

Best for: data profiling

#pandas#value-counts