Data Engineering
Data pipelines, transformations, ETL, and big data processing snippets.
17 snippets
Showing 17 of 17 snippets
Pandas DataFrame Transformations
Common pandas DataFrame transformations including column operations, type casting, and string methods.
Pandas DataFrame Filtering Techniques
Filter DataFrames using boolean masks, query syntax, isin, between, and string matching methods.
Pandas GroupBy Aggregation Examples
GroupBy operations with multiple aggregations, named aggregations, and transform for DataFrame analysis.
Python ETL Pipeline Example
Complete extract-transform-load pipeline with error handling, logging, and incremental processing.
Apache Airflow DAG Example
Airflow DAG with task dependencies, retries, SLA, and PythonOperator for daily data pipeline.
Spark SQL Query Example
PySpark DataFrame operations with SQL queries, window functions, and aggregations for big data.
Python Batch Processing Script
Process large files in configurable batches with progress tracking, error handling, and resume support.
Nested JSON Flattening in Python
Flatten deeply nested JSON structures into flat dictionaries suitable for DataFrames or CSV export.
Python CSV Processing Examples
Read, write, and transform CSV files using the csv module and pandas with encoding and dialect handling.
Data Validation with Pydantic
Validate and parse data records using Pydantic models with custom validators and error reporting.
Retry Logic for Data Pipelines
Configurable retry decorator with exponential backoff and jitter for resilient data pipeline tasks.
Database Sync Script in Python
Sync data between two databases with upsert logic, batch processing, and change detection.
SQL Incremental Load Pattern
Incremental data load using watermark tracking to process only new and updated records efficiently.
SQL Data Deduplication Techniques
Remove duplicate records using ROW_NUMBER, DISTINCT ON, and self-join deduplication strategies.
Databricks Notebook Data Pipeline
Databricks notebook with Delta Lake reads, transformations, merge operations, and table optimization.
Python Streaming Data Processing
Process streaming data with generators, windowed aggregation, and memory-efficient line-by-line reading.
SQL Window Functions for Analytics
Advanced SQL window functions for running totals, rankings, moving averages, and gap analysis.