Schedule - pgDay Paris 2024
Beyond B-trees looking at Columnar Storage and LSM trees
Many of us, both Devs and DBAs, may have spent most of our careers working traditional storage engines i.e. b-tree or b+trees. However over the last decade with the growing popularity of cloud scale (very large and distributed) databases there are many new alternatives, some of which are growing in popularity. On the postgresql.org wiki there is a “Future Of Storage” page (https://wiki.postgresql.org/wiki/Future_of_storage). I’m pretty excited about columnar storage and lsm-trees.
Columnar storage tech has been in data warehouses for a while and is now available as a regular extension, available in the postgres community edition. While there are some limitations with this extension, they are not too difficult to understand and well documented. I want to give a quick demo of Columnar storage use cases and potential gotchas.
While the concepts of columnar storage are relatively intuitive, LSM trees work and behave fundamentally very differently from B-trees and this can lead to some really surprising behavior. Devs can still use regular SQL commands, but the performance can go from very good to very bad, based on the “use case”. LSM tree storage is also available in the community postgres edition via foreign data wrappers. I want to give a slightly longer demo of some of the more surprising behavior i.e. to highlight one of the pitfalls of working with LSM trees and possible application design changes for Devs.
Aim of this talk is not to go in depth into database storage internal structures, rather to give a general overview of both b-tree and lsm-tree storage principles. Plus to help both Devs & DBAs with design and debugging as they start working with Columnar storage and LSM-trees.