ETL Confessions
Presented by:
Corey Huinker
Corey Huinker is a database programmer and consultant based in New York City. He specializes in database query optimization and ETLs.
He is the author of the PGXN modules Poor Mans Parallel Processing and range_partitioning.
His past hobbies have included improvisational theater and refereeing roller derby.
ETL (Extract, Transform, Load) is the industry term for importing data from external sources into a database. However, more often the pattern is ELT - Extract, Load, Transform. This talk covers methods of loading external data into PostgreSQL and reshaping it to fit local needs. The talk addresses popular commercial tools, but focuses mostly on custom coding, specifically: *) Identifying bottlenecks *) Tuning for specific optimization goals (speed, lower resource usage, etc) *) temp tables *) foreign data wrappers *) COPY from PROGRAM *) index management *) filtration techniques *) data validation and error reporting *) importing highly variant data sources
- Date:
- Duration:
- 50 min
- Room:
- Conference:
- PGConf US 2017 [PgConf.US]
- Language:
- Track:
- Development
- Difficulty:
- Medium