Designing The Right Schema To Power Heap
Dan is CTO at Heap, where he uses PostgreSQL, Kafka, Flink, Redis, and CitusDB to build distributed analytics infrastructure. He works in Scala and Node.js day-to-day, though he's been known to get a little too much satisfaction out of solving problems with PL/pgSQL. Dan earned B.S. degrees in Computer Science and Mathematics from Stanford, where spent most of his time studying machine learning. He likes hiking and building physical things.
Heap's analytics infrastructure is built around PostgreSQL. The most important choice to make when building a system this way is the schema you'll use to represent your data. This foundation will determine what sorts of read queries will be fast, your write throughput, what indexing strategies will be available to you, and what data inconsistencies will be possible. With the wrong choice, you won't be able to leverage PostgreSQL's most powerful features. This talk will walk through the different schemas we've used to power Heap over the last three years, their relative strengths and weaknesses, and some mistakes we've made.
- 50 min
- PGConf US 2017
- Use Cases