CMP306: Apache Spark on EC2 History, Best Practices with Customer Use Cases

AWS re:Invent 2016 - En podkast av AWS

Kategorier:

Apache Spark is well known across industries, use cases and businesses of all sizes for its speed and ease of use in sophisticated analysis of large datasets. In this session, learn from Ion Stoica who co-led the Apache Spark project at the AMPLab (UC Berkeley) and co-founder of Databricks, about some of the latest innovations in Spark 2.0, a new open source tool Earnest to choose the optimal cluster configuration for your job, and how and why Databricks choose EC2 to run Spark. We’ll also take a look at how Amazon EC2 and the latest enhancements enable Sparks as a data processing platform, along-with best practices and cost optimization techniques for using Spark with AWS.

Visit the podcast's native language site