Apache Spark

Categories: Spark, Scala, Data, Machine Learning, Data Engineering, FP, Functional Programming
Course Length: 3 Days

Apache Spark is a data analysis and aggregation tool built atop Scala. It is also a distributed calculation tool across multiple worker machines in a cluster. What makes the relationship of Spark and Scala so special is the ability to perform data analysis with functional programming or SQL.

This course is tailored for data analysts and engineers looking to harness their data workloads and develop solutions.

  • Apache Spark Introduction

    • Why Adopt Spark

    • Benefits of Spark

    • Spark Architecture

    • Spark Datasources

  • Spark Components

    • Driver

    • Workers

    • Stages

    • Tasks

    • Partitions

  • Spark Processing

    • Transformations

    • Actions

  • Cluster Managers

    • In House Spark Cluster Manager

    • Mesos

    • YARN

    • Kubernetes

  • Spark Shell

  • Spark UI

  • Running a Spark Job

  • Value Types

  • Programming Spark

    • Enough Scala to get you through the day

    • DataFrame

    • DataSet

    • Spark SQL

    • RDD

  • Spark Streaming

    • Structured

    • Unstructured

  • GraphX

  • Spark MLLib