Anonymizing Data in Postgres Without Losing Your Mind (or Your Schema)
October 21–24
Scrubbing sensitive data in PostgreSQL can feel like a chore, especially when you're trying to do it safely, flexibly, and without rewriting your entire schema. Many teams face the need for column-level anonymization when working with production clones, analytics pipelines, or complying with regulations like GDPR. Doing it well can be a real challenge.
In this talk, we’ll walk through modern, practical ways to anonymize data in Postgres. We’ll cover how to mask or hash specific columns, handle nested fields (like JSON), and even apply different rules based on data values, all without touching the original database structure. You’ll see examples of redaction, pseudonymization, and other transformations that keep your data useful while protecting what matters. We’ll also compare popular open-source tools like pg_anonymize, known for in-database anonymization, with platforms like Greenmask and NeoSync that focus on masking and secure data syncing. Some of these tools are integrated directly into pgstream, enabling seamless, real-time data transformations as part of logical replication workflows.
To bring it all together, we’ll look at how pgstream, an open-source tool that uses logical replication and supports plug-and-play data transformers, can help apply these anonymization techniques in real time. pgstream integrates with several of the open-source tools mentioned earlier, allowing you to set up custom or built-in anonymization rules with simple configuration and without requiring major changes to your database.
If you’ve ever needed to clean up sensitive data before sending it downstream (or just wanted a cleaner way to do it), this talk will give you the tools and ideas to make it happen.