pythonadvanced

Delta Lake MERGE with Python

Perform ACID upserts on a Delta Lake table using the delta-rs Python binding.

python
from deltalake.writer import write_deltalake
from deltalake import DeltaTable
import pandas as pd

df_initial = pd.DataFrame({'id': [1, 2], 'value': ['a', 'b']})
write_deltalake('./my_table', df_initial, mode='overwrite')

df_updates = pd.DataFrame({'id': [2, 3], 'value': ['B_updated', 'c']})
dt = DeltaTable('./my_table')
(
    dt.merge(source=df_updates, predicate='t.id = s.id', source_alias='s', target_alias='t')
    .when_matched_update_all()
    .when_not_matched_insert_all()
    .execute()
)
print(DeltaTable('./my_table').to_pandas())

Use Cases

  • lakehouse upserts
  • ACID data lakes
  • incremental loads

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.