Big Data Hadoop Online Training|Big data analytics training |Acutesoft.
India:+91 (0)8885575549, 040-42627705 | USA: +1 973-619-0109 | UK: +44 207-993-2319

HADOOP / Bigdata






What is Big data?

Big data is a term that describes the substantial volume of information – both Structured and unstructured – that immerses a business on an everyday premise. Yet, it’s not the measure of information that is essential. It’s what associations do with the information that issues. Huge information can be investigated for experiences that prompt better choices and key business moves.

What is Hadoop?

Hadoop is an open source, Java-based programming structure that backings the preparing and capacity of greatly expansive informational indexes in an appropriated registering condition. It is a piece of the Apache extend supported by the Apache Software Foundation.

Request For Free Demo

Who should take this course?

Big Data career opportunities are on the rise, and Hadoop is quickly becoming a must-know technology for the following professionals:

  • Software Developers and Architects
  • Analytics Professionals
  • Senior IT professionals
  • Testing and Mainframe professionals
  • Data Management Professionals
  • Business Intelligence Professionals
  • Project Managers
  • Aspiring Data Scientists
  • Graduates looking to build a career in Big Data Analytics

As the knowledge of Java is necessary for this course, we are providing a complimentary access to “Java Essentials for Hadoop” course For Spark we use Python and Scala and an Ebook has been provided to help you with the sameKnowledge of an operating system like Linux is useful for the course.

Big Data History and Current Considerations

While the term “big data” is relatively new, the act of gathering and storing large amounts of information for eventual analysis is ages old. The concept gained momentum in the early 2000s when industry analyst Doug Laney articulated the now-mainstream definition of big data as the three Vs:

Volume: Organizations collect data from a variety of sources, including business transactions, social media and information from sensor or machine-to-machine data. In the past, storing it would’ve been a problem – but new technologies (such as Hadoop) have eased the burden.

Velocity: Data streams in at an unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time.

Variety: Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions.

At SAS, we consider two additional dimensions when it comes to big data:

Variability: In addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic peaks. Is something trending in social media? Daily, seasonal and event-triggered peak data loads can be challenging to manage. Even more so with unstructured data.

Complexity: Today’s data comes from multiple sources, which makes it difficult to link, match, cleanse and transform data across systems. However, it’s necessary to connect and correlate relationships, hierarchies and multiple data linkages or your data can quickly spiral out of control.

Course Objectives

1.Explain the need for Big Data, and list its applications.
2.Demonstrate the mastery of HDFS concepts and MapReduce framework
3.Use Sqoop and Flume to load data into Hadoop File System
4.Run queries using Pig and Hive
5.Install and configure HBase
6.Discuss and differentiate various commercial distributions of Big Data like Cloudera and Hortonworks.

Hadoop Course


Download [PDF]

  • Hadoop Developer/Admin Training Course Content
  • Hadoop Architecture
      Introduction to Hadoop
      Parallel Computer vs. Distributed Computing
      How to install Hadoop on your system
      How to install Hadoop cluster on multiple machines
      Hadoop Daemons introduction:
      NameNode, DataNode, JobTracker, TaskTracker
      Exploring HDFS (Hadoop Distributed File System)
      Exploring the HDFS Apache Web UI
      NameNode architecture
      (EditLog, FsImage, location of replicas)
      Secondary NameNode architecture
      DataNode architecture
  • MapReduce Architecture
      Exploring JobTracker/TaskTracker
      How to run a Map-Reduce job
      Exploring Mapper/Reducer/Combiner
      Shuffle: Sort & Partition
      Input/output formats
      Exploring the Apache MapReduce Web UI
  • Hadoop Developer Tasks
      Writting a Map-Reduce programme
      Reading and writing data using Java
      Hadoop Eclipse integration
      Mapper in details
      Reducer in details
      Using Combiners
      Reducing Intermediate Data with Combiners
      Writing Partitioners for Better Load Balancing
      Sorting in HDFS
      Searching in HDFS
      Hands-On Exercise
  • Hadoop Administrative Tasks
      Writting a Map-Reduce programme
      Reading and writing data using Java
      Hadoop Eclipse integration
      Mapper in details
      Reducer in details
      Using Combiners
      Reducing Intermediate Data with Combiners
      Writing Partitioners for Better Load Balancing
      Sorting in HDFS
      Searching in HDFS
      Hands-On Exercise
  • HBase Architecture
      Routine Administrative Procedures
      Understanding dfsadmin and mradmin
      Block Scanner, Balancer
      Health Check & Safe mode
      Monitoring and Debugging on a production cluster
      NameNode Back up and Recovery
      DataNode commissioning/decommissioning
      ACL (Access control list)
      Upgrading Hadoop
  • Hive Architecture
      Introduction to Hive
      HBase vs Hive
      Installation of Hive on your system
      HQL (Hive query language )
      Basic Hive commands
  • PIG Architecture hadoop
      Introduction to Pig
      Installation of Pig on your system
      Basic Pig commands
      Hands-On Exercise
  • Sqoop Architecture
      Introduction to Sqoop
      Installation of Sqoop on your system
      Import/Export data from RDBMS to HDFS
      Import/Export data from RDBMS to HBase
      Import/Export data from RDBMS to Hive
      Hands-On Exercise
  • Mini Project / POC ( Proof of Concept )
      Facebook-Hive POC
      Usages of Hadoop/Hive @ Facebook
      Static & dynamic partitioning
      UDF ( User defined functions )