Citus Architecture: Extending Postgres to Build a Distributed Database
Presented by:
Ozgun Erdogan
Ozgun is a co-founder and the CTO at Citus Data. Prior to Citus, Ozgun worked as a software developer for four years in the Distributed Systems Engineering team at Amazon. There, he proposed, designed, and implemented novel algorithms on distributed caching and consistency; and also worked on building systems for scalable data analytics. Ozgun earned his M.S. in Computer Science from Stanford University, and his B.S. from Galatasaray University. He also holds patents on distributed cache consistency and load balancing.
No video of the event yet, sorry!
Citus is a distributed database that scales out Postgres. By using the extension APIs, Citus distributes your tables across a cluster of machines and parallelizes SQL queires. This talk describes the Citus architecture by focusing on our learnings in distributed systems. We first describe how Citus leverages PostgreSQL's extension APIs. These APIs are rich enough to store distributed metadata, add new commands to Postgres to help with sharding, parallelize and execute queries in a distributed cluster, and handle automatic failover of machines. Second, we show the architecture of a distributed query planner. We first describe the join order planner and describe how it chooses between broadcast, co-located, and repartition joins to minimize network I/O. We then show how we map SQL queries into distributed relational algebra, and optimize these plans for parallel execution. Third, we note a primary challenge in distributed systems. No single executor works great for all workloads. We show how Citus chooses between three executors, each one optimized for a different workload: NoSQL, operational analytics, and data warehousing. We then conclude with a demo that shows Citus running on a large cluster.
- Date:
- Duration:
- 30 min
- Room:
- Conference:
- PGConf US 2016 [PgConf.US]
- Language:
- Track:
- Internals
- Difficulty:
- Medium