Performant time-series data management and analytics with Postgres
Presented by:
Matvey Arye
Mat has been working on data infrastructure in both academia (Princeton, PhD) and industry. As one of TimescaleDB's core architects he works on performance, scalability, and query power. Previously, he attended Stuyvesant, The Cooper Union, and Princeton.
No video of the event yet, sorry!
Time-series databases are one of the fasting growing segments of the database market, spreading across industries and use cases. Common requirements including ingesting high volumes of structured data; answering complex, performant queries for both recent and historical time intervals; and performing specialized time-centric analysis and data management.
Today, many developers working with time series data turn to polyglot solutions: a NoSQL database to store their time series data (for scale) and a relational database for associated metadata and key business data. Yet this leads to engineering complexity, operational challenges, and even referential integrity concerns.
I explain how one can avoid these operational problems by re-engineering Postgres to serve as a general data platform, including high-volume time-series workloads. In particular, TimescaleDB is an open-source time-series databases, implemented as a Postgres plugin, that improves insert rates by 20x over vanilla Postgres and much faster queries, even while offering full SQL (including JOINs). TimescaleDB achieves this by storing data on an individual server in a manner more common to distributed systems: heavily partitioning (sharding) data into chunks to ensure that hot chunks corresponding to recent time records are maintained in memory.
In this talk, I focus on two newly-released features of TimescaleDB, and discuss how these capabilities ease time-series data management: (1) the automated adaptation of time-partitioning intervals, which the database learns by observing data volumes; (2) continuous aggregations in near-real-time, in a manner robust to late-arriving data and transparently supporting queries across different aggregation levels. I discuss how these capabilities have been leveraged across several different use cases.
- Date:
- 2018 October 16 09:00 PDT
- Duration:
- 50 min
- Room:
- Market
- Conference:
- Silicon Valley
- Language:
- Track:
- Data
- Difficulty:
- Easy