Speeding Up Analytic Queries by Sharing Commonalities
Presented by:
Yuya Watari
Yuya Watari works for NTT Software Innovation Center / NTT Open Source Software Center in Japan.
No video of the event yet, sorry!
Analyzing data plays an important role in business and fast query processing is the key. A method for significantly speeding up analytic queries will be presented in this talk that urges sharing of commonalities among queries.
Analyzing large amounts of data that is generated in corporate business activities and feedbacking the results for the next action has gained importance. This type of analysis is referred to as Business Intelligence (BI). A type of UI to it, BI dashboards are widely used in many situations.
Queries often have commonalities among them. According to a survey of BI, a lot of users issue several incomplete queries repeatedly because they need to refine the queries. A series of these queries leads to a large number of common operations. Caching the results of such operations brings a significant reduction of the execution time for subsequent queries.
In our caching methodology, the execution results of plan nodes which have appeared in previous queries are cached. If an arrived query matches the cache, it is reused instead of executing the query again. This method is not a simple result caching. While a simple result caching only works when the same queries arrive, our method can accelerate queries that partially match past ones. Additionally, since our method is transparent, users do not need to explicitly manage the cache. Therefore, our method is distinguished from materialized views.
We implemented this caching technique on PostgreSQL and conducted experiments using Public BI benchmark, which contains real data and queries from Tableau, and TPC-DS. As a result, we obtained high improvements in query performance. This result indicates that caching results of commonalities is very effective at improving query performance.
In this talk, we will first present how the shared execution methodology including the caching method drastically improves query performance. Then, the implementations on PostgreSQL and experimental results using Public BI benchmark and TPC-DS will follow. Lastly, we will briefly outlook future work.
- Date:
- 2020 November 19 14:50 CST
- Duration:
- 40 min
- Room:
- Virtual - English Sub-Conference A
- Conference:
- CHINA 2020 And PGConf.Asia 2020
- Language:
- Track:
- Performance
- Difficulty:
- Medium