Memory management is at the heart of any data-intensive system. Apache Spark, in particular, must arbitrate memory allocation between two main use cases: buffering intermediate data for processing (execution) and caching user data (storage). This talk will take a deep dive through the memory management designs adopted in Spark since its inception and discuss their performance and usability implications for the end user.

Presenters

andrew-or-ss.jpg

Andrew Or

Software Engineer - Databricks

Andrew is an Apache Spark PMC member. In the past, he has contributed several large features to the project, including event logging, external spilling, history server, dynamic allocation, and DAG visualization on the SparkUI. He is an active maintainer of the Spark on YARN integration component.