pythonadvanced
Delta Lake MERGE with Python
Perform ACID upserts on a Delta Lake table using the delta-rs Python binding.
pythonPress ⌘/Ctrl + Shift + C to copy
from deltalake.writer import write_deltalake
from deltalake import DeltaTable
import pandas as pd
df_initial = pd.DataFrame({'id': [1, 2], 'value': ['a', 'b']})
write_deltalake('./my_table', df_initial, mode='overwrite')
df_updates = pd.DataFrame({'id': [2, 3], 'value': ['B_updated', 'c']})
dt = DeltaTable('./my_table')
(
dt.merge(source=df_updates, predicate='t.id = s.id', source_alias='s', target_alias='t')
.when_matched_update_all()
.when_not_matched_insert_all()
.execute()
)
print(DeltaTable('./my_table').to_pandas())Use Cases
- lakehouse upserts
- ACID data lakes
- incremental loads
Tags
Related Snippets
Similar patterns you can reuse in the same workflow.
pythonadvanced
Spark SQL Query Example
PySpark DataFrame operations with SQL queries, window functions, and aggregations for big data.
Best for: Processing large-scale datasets with Spark
#spark#pyspark
pythonadvanced
Databricks Notebook Data Pipeline
Databricks notebook with Delta Lake reads, transformations, merge operations, and table optimization.
Best for: Medallion architecture data pipelines on Databricks
#databricks#delta-lake
pythonadvanced
PySpark DataFrame — Filter and Aggregate
Common PySpark DataFrame operations: filter, group by, window functions, and write to Parquet.
Best for: Large-scale data aggregation on distributed clusters
#spark#pyspark
bashadvanced
Spark Submit — Job Launcher Script
Launch PySpark jobs with spark-submit including cluster configuration, dependencies, and monitoring.
Best for: Launching PySpark batch jobs on YARN clusters
#spark#pyspark