Apache Spark & Scala - Online Training for SAP|Oracle|JAVA|Microsoft|Mobile Apps|Testing|SAS|Hadoop - Acutesoft Solutions
India:+91 (0)8885575549, 040-42627705 | USA: +1 973-619-0109 | UK: +44 207-993-2319 Santosh@acutesoft.com

Apache Spark & Scala

Apache Spark & Scala ONLINE TRAINING




Apache Spark & Scala ONLINE TRAINING

Real time Apache Spark and Scala Training with Real time Production Environment.

Who should learn this Course

People with some software development background who want to learn the hottest technology in big data analysis will want to check this out. This course focuses on Spark from a software development standpoint; If you want to learn how to use Spark to carve up huge datasets and extract meaning from them, then this course is for you.

If your software development job involves, or will involve, processing large amounts of data, you need to know about Spark.

Data engineers and data analysts, with some basic understanding of big data processing.

What you learn ?

  • Real time tools and technologies with real time dataset.
  • Programming Languages :Scala and basics of Python
  • Frameworks:Apache Spark
  • Code Repository : BitBucket/Github
  • Build tools : Maven and SBT.
  • IDE:Eclipse, Intellij and Zeppelin.
  • Real time cluster : EMR/Cloudera
  • Number of nodes in cluster : 20
  • Request For Free Demo

    Course Content:

    You will learn about the various infrastructure layouts and understand both Development and Operations (DevOps).

    Module 1:Scala for Apache Spark

    In this module we will learn about fundamentals of programming language


    • Brief about programming languages
    • Brief comparison of Scala and Java
    • Brief overview of the Scala language
    • How to compile a Scala program
    • The Scala shell (interpreter)
    • Brief overview of tooling
    • developing Scala in an IDE SBT, the Scala Build Tool
    • Basic Scala syntax
    • Scala variables, including mutable vs. immutable values
    • Basic Scala types (primitives, tuples)
    • Control flow (loops, conditionals)
    • Functions, and lambdas
    • Scoping
    • Object-oriented programming in Scala classes, traits and inheritance methods
    • Scala collections and the common operations on them (the basis of the Spark RDD API)
    • Type inference
    • Imports
    • Overview of functional vs. imperative
    • programming Case classes
    • Pattern matching
    • For-comprehensions
    Module 2: Real time Environment

    In this module we learn about real time production environment called Amazon Web Services


    • Brief about aws
    • Account creation
    • virtual server creation(ec2)
    • Installation of apache spark on ec2
    • Brief understanding of aws services which are mostly used by Data Engineer/ Data Scientist.
    Module 3:Apache Spark Core

    In this module will understand why Apache Spark ? and the difference between spark and mapreduce and the fundamentals of apache spark with real time examples.


    • Spark Capabilities and Ecosystem
    • Basic Spark Components
    • Resilient Distributed Datasets (RDD) Fundamentals.
    • Purpose and Structure of RDDs
    • Operation Resilient Distributed Datasets (RDD).
    • Transformations and actions (map, flatMap, filter, reduce, reduceByKey ,etc). In dept of RDD programming API

    Solving real time case studies using above learning.

    a)Analysis of crime dataset using above learning

    b)Analysis of adult dataset.

    c)Getting insight financial data.

    There will be 4 case study assignments with proper problem statement and dataset.

    Module 4 : SparkSQL and DataFrames

    In this module will learn in detail of apache spark with sql.


    • Spark SQL and DataFrame Uses
    • Creating DataFrames
    • Reading csv,text,parquet and json dataset.
    • Query with DataFrame API and SQL
    • Joining data frames.
    • Caching and Re-using DataFrames
    • Caching and Re-using DataFrames
    • Solving Case study using above learning.

    There will be lots of case studies based on student interested domains(bank,insurance etc) on open source dataset available.

    Module 5 : Apache Spark Internals

    This module is advanced part of apache spark.


    • In-depth discussion of spark architecture.
    • Jobs, Stages, and Tasks
    • Partitions and Shuffles
    • Data Locality
    • Job Performance
    • Visualizing DAG Execution
    • Observing Task Scheduling
    • Analyze Spark jobs using the administration UIs
    Module 6:Apache spark Streaming

    This module is for handling streaming data using apache spark.


    • Creating DStreams from Sources
    • Operating on DStream Data
    • Reading from TCP
    • Reading from Kafka
    Module 7:

    Best practice and optimisation based on Students solutions of assignments given by student.

    Module 8:Machine Learning

    In this module will learn how to leverage apache spark in machine learning


    • Basic Principles of Machine Learning
    • Spark ML API Patters
    • Built-in Featuring and Algorithm APIs
    • There will be 10 case real time machine learning studies with real time dataset.
    Module 9 : Integration with other tools

    In this module will learn how to leverage apache spark in machine learning


    • Integration with Kafka
    • Integration with Cassandra
    • Integration with Hive
    Module 10 :

    Real time amazon EMR cluster creation.

    Module 11 :

    There will 2 Projects based student interested domains.

    Module 12 :

    Resume and interview preparation.