Distributed computing on PostgreSQL Thursday 14:40 Baltic II+III
Twitter: @marcoslot GitHub: marcocitus LinkedIn: marcoslot
I'm a principal engineer at Citus Data, working on the core distributed database technology.
I was a speaker at pgconf.eu in Vienna 2 years ago and really enjoyed the breadth of the conference and attention to detail of the organisers. I felt like I got several conferences worth of interesting talks and conversations out of one conference.
I'll talk about using PostgreSQL as a distributed computing platform. Companies dealing with high data volumes tend to deploy a lot of different systems nowadays. When you look at all those systems, it makes you wonder: Why not use postgres? It can do all of that! Of course the answer is usually that these systems are distributed and can can scale out to handle higher volumes.
Funny thing about postgres though, it's not just a great database, but thanks to its extensibility it's also becoming a great building block for distributed systems. By combining postgres' own features with a number of extensions such as Citus and dblink, you can string together reliable, scalable distributed systems in the same way you would create a data model. In the talk, I'll discuss the tools you have available for distributed computing in postgres and give an example of how you could use them to build a Kafka replacement.
The talk explores the possibilities that the PostgreSQL ecosystem gives developers for building distributed data pipelines, so the primary audience is developers who use PostgreSQL.
I'll show how to build advanced distributed systems using relatively new PostgreSQL features such as logical decoding, so some awareness of recent PostgreSQL developments is recommended.
There is a big move towards consolidation right now: Fewer protocols, fewer systems. Kafka is consolidating messaging between different systems. Kubernetes is creating a single platform for software deployment, management, and operations across different types of infrastructure. However, the data storage, processing, and retrieval space is still wide open with many different tools, most of which are functionally limited or not production-ready, creating the need for yet more tools. PostgreSQL has the right combination of flexibility, robustness, and - with extensions - scalability to become a major point of consolidation. You can start on single postgres server, and if you can scale out its different functions, then you can keep growing your business without worrying about having to rearchitect all the time.