More Datacenters, Less Problems
April 21–22
As Datadog continues to grow we need to prioritize datacenter expansion. Unfortunately, our Postgres architecture—previously supporting a handful of datacenters—became a painful liability for operators and service owners. Hidden coupling, operational toil, and reliance on components like PgBouncer surfaced major coordination challenges for datacenter expansion.
To understand what needed to change, we used AI-assisted analysis to examine how Postgres was actually being used across hundreds of services. By analyzing real production workloads, queries, and traffic patterns, we identified hidden dependencies and unsafe assumptions that were impossible for individual teams to investigate alone, allowing us to deliver architectural and service-level changes with confidence.
In this talk, I’ll share how our team simplified a production Postgres architecture to enable safe, repeatable, and hands-off datacenter expansion. I’ll walk through the original design, the failure modes that forced change, and the deliberate tradeoffs we made. I’ll demonstrate how we used Temporal to automate previously manual workflows, removed redundant dependencies, and ultimately deprecated PgBouncer in favor of a homegrown Postgres proxy.
This is a practical, experience-driven talk about simplifying Postgres at scale, using automation to tame complexity, AI to detangle existing workloads, and building database architectures that can grow without becoming an operator's nightmare.