Recorded on 03/07/2016 10:00am PT, 1:00pm ET, 6:00pm UTC
The slides and notebooks for this session are available as attachments within the webinar itself. Please start the webinar, hover over the webinar, click [Attachments], and you will be able to download all the materials.
In this webcast, Jason Pohl, Solution Engineer from Databricks, will cover how to build a Just-in-Time Data Warehouse on Databricks with a focus on performing Change Data Capture from a relational database and joining that data to a variety of data sources. Not only does Apache Spark and Databricks allow you to do this easier with less code, the routine will automatically ingest changes to the source schema.
Highlights of this webinar include:
- Starting with a Databricks notebook, Jason will build a classic Change Data Capture (CDC) ETL routine to extract data from an RDBMS.
- A deep-dive into selecting a delta of changes from tables in an RDBMS, writing it to Parquet, querying it using Spark SQL.
- Demonstrate how to apply a schema at time of read rather than before write.