Schedule - PGConf.DE 2025
Building a Data Lakehouse with PostgreSQL: Dive into Formats, Tools, Techniques, and Strategies
Date: 2025-05-08
Time: 10:25–11:10
Room: Berlin 1
Level: Intermediate
The evolution of Data Warehouses, Data Lakes, and Data Lakehouses has been marked by many buzzwords, fluctuating trends, and tools that often over-promised but under-delivered. While there are numerous materials on these topics, most of them provide mostly introductory overviews and focus narrowly on a single technology. And there are even many different opinions about what exactly is Data Lakehouse.
This talk discusses different ways how to understand this topic. It explores data formats and frameworks like Parquet, Apache Iceberg, Delta Lake, Apache Hudi. Discusses different architectures of Data Lakehouse solutions. Also key challenges will be addressed, such as effective Data Governance, compliance with privacy and security standards, and comprehensive data quality checks.
Last part of the talk address current AI hype with its many promises and proposes realistic overview of real capabilities of current Large Language Models and their use cases in Data Lakehouses.
PostgreSQL is extremely well equipped to play a major role in the current Data Lakehouse and AI boom.
Key Takeaways:
- A comprehensive overview of Data Lakehouse architecture
- Insights into key data formats and frameworks in modern Data Lakehouses
- Practical ideas for implementing Data Governance practices
- Realistic view of real capabilities of current LLMs in scope of Data Lakehouses
Slides
The following slides have been made available for this session:
- https://github.com/josmac69/conferences_slides/blob/main/2025/pg_conf_de_2025/migration_to_postgresq
- https://github.com/josmac69/conferences_slides/blob/main/2025/pg_conf_de_2025/postgresql_data_lakeho
