Presented by:

8a8924f5aa1112ad141d54cb8381ba3b

Dan Robinson

Heap

Dan is CTO at Heap, where he uses PostgreSQL, Kafka, Flink, Redis, and CitusDB to build distributed analytics infrastructure. He works in Scala and Node.js day-to-day, though he's been known to get a little too much satisfaction out of solving problems with PL/pgSQL. Dan earned B.S. degrees in Computer Science and Mathematics from Stanford, where spent most of his time studying machine learning. He likes hiking and building physical things.

Heap's analytics infrastructure is built around PostgreSQL. The most important choice to make when building a system this way is the schema you'll use to represent your data. This foundation will determine what sorts of read queries will be fast, your write throughput, what indexing strategies will be available to you, and what data inconsistencies will be possible. With the wrong choice, you won't be able to leverage PostgreSQL's most powerful features. This talk will walk through the different schemas we've used to power Heap over the last three years, their relative strengths and weaknesses, and some mistakes we've made.

Date:
2017 March 29 09:30
Duration:
50 min
Room:
Liberty III
Conference:
PGConf US 2017
Language:
Track:
Use Cases
Difficulty:
Medium