Data warehousing has come a long way in the past few years, solving many challenges like cost efficiency of storing huge amounts of data and computing over it; but at the same time it introduced a new set of challenges since the optimal data warehouse is one that has immutable data. For some, it was harder to adopt modern data warehousing technologies because of cases where data still needs to keep changing due to business requirements (reference data updates for example) , compliance (e.g. GDPR) or others.
Today there are three leading technology solutions to this, each unique in its own way – DeltaLake, Apache Hudi and Apache Iceberg. All three are meant to add mutability to data warehousing. In this talk we will examine each, understand how they work, and look at strengths and weaknesses of each of them. There are indeed differences, and by the end of this talk you will be able to understand whether one of those can help you in your data journey; and if so, which one would fit your use-case best.