Data lifecycle management with SPQR and Apache Cloudberry
Presented by:
Andrey Borodin
Software engineer, computer scientist, software engineer at Yandex, Ph.D., associated professor at Ural Federal University, co-founder of Octonica company. Researching data indexing since 2008. Teaching at Yandex School for Data Analysis and UrFU. Contributing to PostgreSQL since 2016.
No video of the event yet, sorry!
There is a common data architecture pattern OLTP-CDC-OLAP. The operational database (OLTP) is constantly rebuilt in an analytical database (OLAP) through change data capture (CDC). CDC processes can be fragile: engineers need to scale them when they scale the databases, update them during database migrations, and constantly monitor them to ensure that everything is up-to-date.
SPQR (Stateless Postgres Query Router) is a Postgres sharding system that focuses on shard rebalancing. It is designed to handle operational workloads of millions small transactions per second. It also has processes for data movement that can be used to maintain an analytical copy.
Cloudberry is a PostgreSQL-based analytics system that can be easily integrated with SPQR because it is just another PostgreSQL for SPQR. This integration also expands the range of storage models that can be used simultaneously. A hot operational data is available for high throughput transactions, as it is distributed across many Postgres nodes. Recent historical data can be accessed for nearly real-time analytics on a columnar MPP system. Older historical data is stored in S3 and can still be quickly queried alongside other storage models. In this system, scaling operational data and replicating analytical data is a seamless process!
I'm an engineer, so in the talk I'll focus on caveats and limitations of this approach.
- Date:
- Duration:
- 50 min
- Room:
- Conference:
- Postgres Conference 2025
- Language:
- Track:
- Variants and Cloud
- Difficulty:
- Medium