cstore_fdw: Columnar store for PostgreSQL
Presented by:
Ozgun Erdogan
Ozgun is a co-founder and the CTO at Citus Data. Prior to Citus, Ozgun worked as a software developer for four years in the Distributed Systems Engineering team at Amazon. There, he proposed, designed, and implemented novel algorithms on distributed caching and consistency; and also worked on building systems for scalable data analytics. Ozgun earned his M.S. in Computer Science from Stanford University, and his B.S. from Galatasaray University. He also holds patents on distributed cache consistency and load balancing.
No video of the event yet, sorry!
Column oriented data stores bring notable performance advantages for analytic workloads; and they have gained popularity as part of proprietary database solutions in the past few years. cstore_fdw is an open source columnar store for PostgreSQL. The extension follows the same data layout as Facebook's Optimized Row Columnar (ORC) format, and brings the following benefits: * Compression: Reduces the on-disk data size by 2-4x * Column Projections: Only read column data relevant to query, improves the performance for I/O bound queries * Skip Indexes: Keeps min/max statistics for row groups and uses them to skip over unrelated rows. cstore_fdw uses PostgreSQL’s binary data format for storing values, and works with every data type supported by PostgreSQL. You can use the same SQL syntax that PostgreSQL provides to query cstore_fdw tables. In this talk, we first motivate the need for columnar stores. We then summarize architectural decisions that factor into building a columnar store for PostgreSQL, and how they apply to the columnar store foreign data wrapper. Next, we provide performance benchmarks for in-memory and on-disk workloads, and summarize customer use-cases. We conclude by running a demo that shows the columnar store and its benefits in action.
- Date:
- Duration:
- 30 min
- Room:
- Conference:
- PGConf US 2015 [PgConf.US]
- Language:
- Track:
- General
- Difficulty:
- Medium