Skip to main content
Company Blog

Mastering Advanced Analytics with Apache Spark eBook

We are excited to announce that the second eBook in our technical blog book series, Mastering Advanced Analytics with Apache Spark, has been released today!

You can download the eBook here.

We focused on the topic of “Advanced Analytics” due to the challenges created by the continued growth in data. This coupled with increasingly complex use cases demands much more than running queries against the data set. Whether you’re scrutinizing the clickstream from millions of visitors to optimize online ad placements or sifting through billions of transactions to identify signs of fraud, more sophisticated approaches to automatically glean insights from enormous volumes of data - such as machine learning and graph processing - is more important than ever.

This eBook offers a collection of the most popular technical blog posts that provide an introduction to machine learning and other advanced techniques on Spark, including:

  • An introduction to machine learning in Apache Spark
  • Using Spark for advanced topics such as clustering, trees, graph processing
  • How you can use SparkR to analyze data at scale with the R language

Screenshot from the Mastering Advanced Analytics with Apache Spark eBook

We’ve also augmented the blogs with new code examples in Databricks notebooks, which are freely available with the eBook download. A sample of the new notebooks include:

  • Scalable Decision Trees with MLlib
  • ML Import, Export, and Simple Operations
  • Generalized Linear Models in SparkR
  • Random Forests and Boosting in MLlib

Download the eBook to get started on your next advanced analytics project today. To try out the code examples, get on the waitlist for the Databricks Community Edition. If you have not read the first eBook in the series, be sure to check out Apache Spark Analytics Made Simple for technical content and code examples geared toward an introduction to data analytics with Apache Spark.