Recorded on 02/11/2016 10:00am PT, 1:00pm ET, 6:00pm UTC

The slides and notebooks for this session are available as attachments within the webinar itself.  Please start the webinar, hover over the webinar, click [Attachments], and you will be able to download all the materials.

You can also see the answers to questions from the live broadcast audience here. 

Denny Lee, Technology Evangelist with Databricks, will provide a jump start into Apache Spark and Databricks. Spark is a fast, easy to use, and unified engine that allows you to solve many Data Sciences and Big Data (and many not-so-Big Data) scenarios easily. Spark comes packaged with higher-level libraries, including support for SQL queries, streaming data, machine learning, and graph processing. We will leverage Databricks to quickly and easily demonstrate, visualize, and debug our code samples; the notebooks will be available for you to download. This introductory level jump start will focus on the following scenarios: 

  • Quick Start on Spark: Provides an introductory quick start to Spark using Python and Resilient Distributed Datasets (RDDs).  We will review how RDDs have actions and transformations and their impact on your Spark workflow.
  • A Primer on RDDs to DataFrames to Datasets: This will provide a high-level overview of our journey from RDDs (2011) to DataFrames (2013) to the newly introduced (as of Spark 1.6) Datasets (2015).  
  • Just in Time Data Warehousing with Spark SQL: We will demonstrate a Just-in-Time Data Warehousing (JIT-DW) example using Spark SQL on an AdTech scenario.  We will start with weblogs, create an external table with RegEx, make an external web service call via a Mapper, join DataFrames and register a temp table, add columns to DataFrames with UDFs, use Python UDFs with Spark SQL, and visualize the output - all in the same notebook.

Presenters

Denny-Lee.jpg

Denny Lee

Technology Evangelist - Databricks

Denny Lee is a Technology Evangelist with Databricks; he is a hands-on data sciences engineer with more than 15 years of experience developing internet-scale infrastructure, data platforms, and distributed systems for both on-premises and cloud. Prior to joining Databricks, Denny worked as a Senior Director of Data Sciences Engineering at Concur and was part of the incubation team that built Hadoop on Windows and Azure (currently known as HDInsight).