Schedule - FOSDEM PGDay 2024

Collation Challenges - Sorting it Out

Date: 2024-02-02
Time: 09:20–10:10
Room: Ballroom

Background: "libc" is commonly used as a shorthand for the "standard C library", a library of standard functions that can be used by all C programs. glibc is the GNU C Library implementation, which is used on all major Linux distributions (e.g. AL, RHEL, Debian/Ubuntu, SuSE). The glibc library,, provides most of the foundational C routines such as open, read, write, malloc, printf, and literally thousands more. It also provides the interface to the Linux kernel via syscalls.

For the purposes of this talk, the facility of interest is the locale functionality, and more specifically the functions that provide string sorting according to localized collation rules. In order for PostgreSQL to work durably and correctly, sort order must be determinant and immutable. Since glibc implements the sort order, if/when glibc changes the sort order from one version to the next, it breaks the contract with PostgreSQL, and thereby causes data corruption. Indexes that have been persisted to storage may now memorialize the data in the wrong order according to the currently installed version of glibc.

Proposed Solution: A solution, outlined in this talk, demonstrates a method to build a collation compatibility library on a system with a very specific glibc base-version. That may then be used on another Linux system to provide stable collation, and thus avoid breakage due to glibc and/or OS upgrades.

Summary: If a PostgreSQL database resides on, for example, a RHEL 7 system with glibc version 2.17, and the operating system (OS) is upgraded to RHEL 9 with glibc version 2.34, the majority of indexes built on collatable columns will be broken. This talk will walk through examples of the types of breakage that can occur, the proposed solution at a high level, and a demonstration of the solution in action.


The following slides have been made available for this session:


Joe Conway