PostgreSQL Conference Europe 2017 - Speaker Interview - Marco Slot

Speaker Interview: Marco Slot

Distributed computing on PostgreSQL Thursday 14:40 Baltic II+III

Twitter: @marcoslot GitHub: marcocitus LinkedIn: marcoslot

Could you briefly introduce yourself?

I'm a principal engineer at Citus Data, working on the core distributed database technology.

Have you enjoyed previous pgconf.eu conferences, either as attendee or as speaker?

I was a speaker at pgconf.eu in Vienna 2 years ago and really enjoyed the breadth of the conference and attention to detail of the organisers. I felt like I got several conferences worth of interesting talks and conversations out of one conference.

What will your talk be about, exactly? Why this topic?

I'll talk about using PostgreSQL as a distributed computing platform. Companies dealing with high data volumes tend to deploy a lot of different systems nowadays. When you look at all those systems, it makes you wonder: Why not use postgres? It can do all of that! Of course the answer is usually that these systems are distributed and can can scale out to handle higher volumes.

Funny thing about postgres though, it's not just a great database, but thanks to its extensibility it's also becoming a great building block for distributed systems. By combining postgres' own features with a number of extensions such as Citus and dblink, you can string together reliable, scalable distributed systems in the same way you would create a data model. In the talk, I'll discuss the tools you have available for distributed computing in postgres and give an example of how you could use them to build a Kafka replacement.

What is the audience for your talk?

The talk explores the possibilities that the PostgreSQL ecosystem gives developers for building distributed data pipelines, so the primary audience is developers who use PostgreSQL.

What existing knowledge should the attendee have?

I'll show how to build advanced distributed systems using relatively new PostgreSQL features such as logical decoding, so some awareness of recent PostgreSQL developments is recommended.

What is next in distributed computing? How is PostgreSQL doing in this area?

There is a big move towards consolidation right now: Fewer protocols, fewer systems. Kafka is consolidating messaging between different systems. Kubernetes is creating a single platform for software deployment, management, and operations across different types of infrastructure. However, the data storage, processing, and retrieval space is still wide open with many different tools, most of which are functionally limited or not production-ready, creating the need for yet more tools. PostgreSQL has the right combination of flexibility, robustness, and - with extensions - scalability to become a major point of consolidation. You can start on single postgres server, and if you can scale out its different functions, then you can keep growing your business without worrying about having to rearchitect all the time.