Data processing more than billion rows per second
Presented by:
Kohei KaiGai
KaiGai Kohei has been a contributor of PostgreSQL and Linux kernel over 15 years, especially, at the area of security, database-federation (FDW), extensible executor and so on. Also, he hes developed PG-Strom; an extension module for GPU-accelerated PostgreSQL since 2012. It enables to process large scale data with a simple single node PostgreSQL using GPU, NVME or RoCE networks.
He founded HeteroDB,Inc at 2017 to focus on the development and productization of the PG-Strom technology.
Nowadays, GPU is not only for computing intensive workloads, but for I/O intensive big-data workloads also.
This talk introduces how SSD-to-GPU Direct SQL, implemented as extension of PostgreSQL, optimizes data flow from storages to processors over PCIe-bus for efficient execution of analytic/reporting workloads.
Combination of this technology with comprehensive database features (e.g, columnar-store, partitioned tables, ...) pulled out maximum capability of the latest hardwares, for more than billion rows per second grade data processing on a single-node PostgreSQL server.
Its main focus is log-data processing on IoT/M2M area where tons of data is generated day-by-day. Our approach allows to simplify the system landscape, and utilize engineer's knowledge and experiences of PostgreSQL.
In short, this talk contains the items below from the technology viewpoint.
- SSD-to-GPU Direct SQL
- Columnar-store (Arrow_Fdw)
- PCIe-bus level optimization using table partitioning
- Benchmark results
- Customer case (under the negotiation)
For your references:
- Date:
- Duration:
- 50 min
- Room:
- Conference:
- Postgres Conference 2020
- Language:
- Track:
- Data Science
- Difficulty:
- Medium