</>SnippetsLabBuild faster with production-ready snippets

pythonadvanced

Delta Lake MERGE with Python

Perform ACID upserts on a Delta Lake table using the delta-rs Python binding.

pythonPress ⌘/Ctrl + Shift + C to copy

from deltalake.writer import write_deltalake
from deltalake import DeltaTable
import pandas as pd

df_initial = pd.DataFrame({'id': [1, 2], 'value': ['a', 'b']})
write_deltalake('./my_table', df_initial, mode='overwrite')

df_updates = pd.DataFrame({'id': [2, 3], 'value': ['B_updated', 'c']})
dt = DeltaTable('./my_table')
(
    dt.merge(source=df_updates, predicate='t.id = s.id', source_alias='s', target_alias='t')
    .when_matched_update_all()
    .when_not_matched_insert_all()
    .execute()
)
print(DeltaTable('./my_table').to_pandas())

Use Cases

lakehouse upserts
ACID data lakes
incremental loads

Tags

#delta-lake #upsert #lakehouse #pyspark

Related Snippets

Similar patterns you can reuse in the same workflow.

Spark SQL Query Example

PySpark DataFrame operations with SQL queries, window functions, and aggregations for big data.

Best for: Processing large-scale datasets with Spark

Databricks Notebook Data Pipeline

Databricks notebook with Delta Lake reads, transformations, merge operations, and table optimization.

Best for: Medallion architecture data pipelines on Databricks

#databricks#delta-lake

PySpark DataFrame — Filter and Aggregate

Common PySpark DataFrame operations: filter, group by, window functions, and write to Parquet.

Best for: Large-scale data aggregation on distributed clusters

Spark Submit — Job Launcher Script

Launch PySpark jobs with spark-submit including cluster configuration, dependencies, and monitoring.

Best for: Launching PySpark batch jobs on YARN clusters