Recorded on 03/07/2016 10:00am PT, 1:00pm ET, 6:00pm UTC 
 
The slides and notebooks for this session are available as attachments within the webinar itself.  Please start the webinar, hover over the webinar, click [Attachments], and you will be able to download all the materials.
 
In this webcast, Jason Pohl, Solution Engineer from Databricks, will cover how to build a Just-in-Time Data Warehouse on Databricks with a focus on performing Change Data Capture from a relational database and joining that data to a variety of data sources. Not only does Apache Spark and Databricks allow you to do this easier with less code, the routine will automatically ingest changes to the source schema.

Highlights of this webinar include:
  1. Starting with a Databricks notebook, Jason will build a classic Change Data Capture (CDC) ETL routine to extract data from an RDBMS.
  2. A deep-dive into selecting a delta of changes from tables in an RDBMS, writing it to Parquet, querying it using Spark SQL.
  3. Demonstrate how to apply a schema at time of read rather than before write.

 

Presenters

jason_pohl.jpg

Jason Pohl

Data Solutions Engineer - Databricks

Jason Pohl is a solutions engineer with Databricks, focused on helping customers become successful with their data initiatives. Jason has spent his career building data-driven products and solutions.
Denny-Lee.jpg

Denny Lee

Technology Evangelist - Databricks

Denny Lee is a Technology Evangelist with Databricks; he is a hands-on data sciences engineer with more than 15 years of experience developing internet-scale infrastructure, data platforms, and distributed systems for both on-premises and cloud. Prior to joining Databricks, Denny worked as a Senior Director of Data Sciences Engineering at Concur and was part of the incubation team that built Hadoop on Windows and Azure (currently known as HDInsight).