📊

Data Engineering

Data pipelines, transformations, ETL, and big data processing snippets.

17 snippets

Showing 17 of 17 snippets

pythonbeginner

Pandas DataFrame Transformations

Common pandas DataFrame transformations including column operations, type casting, and string methods.

#pandas#dataframe
pythonbeginner

Pandas DataFrame Filtering Techniques

Filter DataFrames using boolean masks, query syntax, isin, between, and string matching methods.

#pandas#filtering
pythonintermediate

Pandas GroupBy Aggregation Examples

GroupBy operations with multiple aggregations, named aggregations, and transform for DataFrame analysis.

#pandas#groupby
pythonadvanced

Python ETL Pipeline Example

Complete extract-transform-load pipeline with error handling, logging, and incremental processing.

#etl#pipeline
pythonadvanced

Apache Airflow DAG Example

Airflow DAG with task dependencies, retries, SLA, and PythonOperator for daily data pipeline.

#airflow#dag
pythonadvanced

Spark SQL Query Example

PySpark DataFrame operations with SQL queries, window functions, and aggregations for big data.

#spark#pyspark
pythonintermediate

Python Batch Processing Script

Process large files in configurable batches with progress tracking, error handling, and resume support.

#batch-processing#python
pythonintermediate

Nested JSON Flattening in Python

Flatten deeply nested JSON structures into flat dictionaries suitable for DataFrames or CSV export.

#json#flattening
pythonbeginner

Python CSV Processing Examples

Read, write, and transform CSV files using the csv module and pandas with encoding and dialect handling.

#csv#python
pythonintermediate

Data Validation with Pydantic

Validate and parse data records using Pydantic models with custom validators and error reporting.

#validation#pydantic
pythonintermediate

Retry Logic for Data Pipelines

Configurable retry decorator with exponential backoff and jitter for resilient data pipeline tasks.

#retry#resilience
pythonadvanced

Database Sync Script in Python

Sync data between two databases with upsert logic, batch processing, and change detection.

#database#sync
sqlintermediate

SQL Incremental Load Pattern

Incremental data load using watermark tracking to process only new and updated records efficiently.

#sql#incremental-load
sqlintermediate

SQL Data Deduplication Techniques

Remove duplicate records using ROW_NUMBER, DISTINCT ON, and self-join deduplication strategies.

#sql#deduplication
pythonadvanced

Databricks Notebook Data Pipeline

Databricks notebook with Delta Lake reads, transformations, merge operations, and table optimization.

#databricks#delta-lake
pythonintermediate

Python Streaming Data Processing

Process streaming data with generators, windowed aggregation, and memory-efficient line-by-line reading.

#streaming#python
sqladvanced

SQL Window Functions for Analytics

Advanced SQL window functions for running totals, rankings, moving averages, and gap analysis.

#sql#window-functions