📊

Data Engineering

Data pipelines, transformations, ETL, and big data processing snippets.

17 snippets

Showing 17 of 17 snippets

pythonbeginner

Pandas DataFrame Transformations

Common pandas DataFrame transformations including column operations, type casting, and string methods.

#pandas#dataframe

pythonbeginner

Pandas DataFrame Filtering Techniques

Filter DataFrames using boolean masks, query syntax, isin, between, and string matching methods.

#pandas#filtering

pythonintermediate

Pandas GroupBy Aggregation Examples

GroupBy operations with multiple aggregations, named aggregations, and transform for DataFrame analysis.

#pandas#groupby

pythonadvanced

Python ETL Pipeline Example

Complete extract-transform-load pipeline with error handling, logging, and incremental processing.

#etl#pipeline

pythonadvanced

Apache Airflow DAG Example

Airflow DAG with task dependencies, retries, SLA, and PythonOperator for daily data pipeline.

#airflow#dag

pythonadvanced

Spark SQL Query Example

PySpark DataFrame operations with SQL queries, window functions, and aggregations for big data.

#spark#pyspark

pythonintermediate

Python Batch Processing Script

Process large files in configurable batches with progress tracking, error handling, and resume support.

#batch-processing#python

pythonintermediate

Nested JSON Flattening in Python

Flatten deeply nested JSON structures into flat dictionaries suitable for DataFrames or CSV export.

#json#flattening

pythonbeginner

Python CSV Processing Examples

Read, write, and transform CSV files using the csv module and pandas with encoding and dialect handling.

#csv#python

pythonintermediate

Data Validation with Pydantic

Validate and parse data records using Pydantic models with custom validators and error reporting.

#validation#pydantic

pythonintermediate

Retry Logic for Data Pipelines

Configurable retry decorator with exponential backoff and jitter for resilient data pipeline tasks.

#retry#resilience

pythonadvanced

Database Sync Script in Python

Sync data between two databases with upsert logic, batch processing, and change detection.

#database#sync

sqlintermediate

SQL Incremental Load Pattern

Incremental data load using watermark tracking to process only new and updated records efficiently.

#sql#incremental-load

sqlintermediate

SQL Data Deduplication Techniques

Remove duplicate records using ROW_NUMBER, DISTINCT ON, and self-join deduplication strategies.

#sql#deduplication

pythonadvanced

Databricks Notebook Data Pipeline

Databricks notebook with Delta Lake reads, transformations, merge operations, and table optimization.

#databricks#delta-lake

pythonintermediate

Python Streaming Data Processing

Process streaming data with generators, windowed aggregation, and memory-efficient line-by-line reading.

#streaming#python

sqladvanced

SQL Window Functions for Analytics

Advanced SQL window functions for running totals, rankings, moving averages, and gap analysis.

#sql#window-functions