Spark Streaming with Delta Lake – Tricks and Treats

Hen Ben Hemo

Delta Lake is a smart storage & metadata layer designed to expand the capabilities of the modern file-based data lake. From data deletion to indexing and ACID transactions, Delta enriches the data lake with actual database capabilities.
This is why it was obvious to us, the Riskified Data Engineering team, that Delta should be at the center of our data lake infrastructure implementation.
In this talk, we would like to share with you the challenges we faced building a Spark Streaming platform incorporating Delta Lake.
You’ll be able to hear about using Delta Lake both as a streaming source and destination, how we implemented automated schema evolution, many hacks related to tuning Spark Streaming on Kubernetes for both cost and performance, and more!

Talk language: Hebrew

October 19, 2021
Hall A
Hen Ben Hemo
Hen Ben Hemo

Big Data Tech Lead at Riskified. Hen has been a key player in the design and development of Riskified's next-gen big data infrastructure using DeltaLake, Airflow, Snowflake, and Spark on Kubernetes. Before Riskified, Hen worked in various tech companies, facing different scaling & big data challenges.
Hen is also an amateur pilot, foodie and loves to travel with his wife and kids.

Hen Ben Hemo

Become a sponsor:

Leave your email and we will get back to you with our great sponsorship information:

Cancellation Policy

The conference will be in-person and not virtual, and will take place according to the COVID-19 regulation at the time of the conference. If we will not be able to hold an in-person event, we will postpone it and not make it virtual. In case of cancellation of the event or if the event is postponed to another date, we will offer a full refund to all attendees and sponsors.

Attendee cancellations:

Up to 30 days prior to the event – 100% Refund 30-14 days prior to the event – 50% Refund. No refund will be offered later than that.