Apache Spark

Categories: Spark Scala Data Machine Learning Data Engineering FP Functional Programming

Course Length: 3 Days

Apache Spark is a data analysis and aggregation tool built atop Scala. It is also a distributed calculation tool across multiple worker machines in a cluster. What makes the relationship of Spark and Scala so special is the ability to perform data analysis with functional programming or SQL.

This course is tailored for data analysts and engineers looking to harness their data workloads and develop solutions.

Apache Spark Introduction
- Why Adopt Spark
- Benefits of Spark
- Spark Architecture
- Spark Datasources
Spark Components
- Driver
- Workers
- Stages
- Tasks
- Partitions
Spark Processing
- Transformations
- Actions
Cluster Managers
- In House Spark Cluster Manager
- Mesos
- YARN
- Kubernetes
Spark Shell
Spark UI
Running a Spark Job
Value Types
Programming Spark
- Enough Scala to get you through the day
- DataFrame
- DataSet
- Spark SQL
- RDD
Spark Streaming
- Structured
- Unstructured
GraphX
Spark MLLib