HADOOP / Bigdata - Online Training for SAP|Oracle|JAVA|Microsoft|Mobile Apps|Testing|SAS|Hadoop - Acutesoft Solutions
India:+91 (0)9848346149, USA: +1 973-619-0109, UK: +44 207-993-2319 Santosh@acutesoft.com
Select Page

HADOOP / Bigdata

HADOOP ONLINE TRAINING

HADOOP ONLINE TRAINING IN INDIA, USA & UK

HADOOP ONLINE TRAINING IN CANADA

HADOOP ONLINE TRAINING IN AUSTRALIA

HADOOP ONLINE TRAINING

COURSE CURRICULUM

Download [PDF]

  • Hadoop Developer/Admin Training Course Content
  • Hadoop Architecture
      Introduction to Hadoop
      Parallel Computer vs. Distributed Computing
      How to install Hadoop on your system
      How to install Hadoop cluster on multiple machines
      Hadoop Daemons introduction:
      NameNode, DataNode, JobTracker, TaskTracker
      Exploring HDFS (Hadoop Distributed File System)
      Exploring the HDFS Apache Web UI
      NameNode architecture
      (EditLog, FsImage, location of replicas)
      Secondary NameNode architecture
      DataNode architecture
  • MapReduce Architecture
      Exploring JobTracker/TaskTracker
      How to run a Map-Reduce job
      Exploring Mapper/Reducer/Combiner
      Shuffle: Sort & Partition
      Input/output formats
      Exploring the Apache MapReduce Web UI
  • Hadoop Developer Tasks
      Writting a Map-Reduce programme
      Reading and writing data using Java
      Hadoop Eclipse integration
      Mapper in details
      Reducer in details
      Using Combiners
      Reducing Intermediate Data with Combiners
      Writing Partitioners for Better Load Balancing
      Sorting in HDFS
      Searching in HDFS
      Hands-On Exercise
  • Hadoop Administrative Tasks
      Writting a Map-Reduce programme
      Reading and writing data using Java
      Hadoop Eclipse integration
      Mapper in details
      Reducer in details
      Using Combiners
      Reducing Intermediate Data with Combiners
      Writing Partitioners for Better Load Balancing
      Sorting in HDFS
      Searching in HDFS
      Hands-On Exercise
  • HBase Architecture
      Routine Administrative Procedures
      Understanding dfsadmin and mradmin
      Block Scanner, Balancer
      Health Check & Safe mode
      Monitoring and Debugging on a production cluster
      NameNode Back up and Recovery
      DataNode commissioning/decommissioning
      ACL (Access control list)
      Upgrading Hadoop
  • Hive Architecture
      Introduction to Hive
      HBase vs Hive
      Installation of Hive on your system
      HQL (Hive query language )
      Basic Hive commands
      Hands-on-Exercise
  • PIG Architecture hadoop
      Introduction to Pig
      Installation of Pig on your system
      Basic Pig commands
      Hands-On Exercise
  • Sqoop Architecture
      Introduction to Sqoop
      Installation of Sqoop on your system
      Import/Export data from RDBMS to HDFS
      Import/Export data from RDBMS to HBase
      Import/Export data from RDBMS to Hive
      Hands-On Exercise
  • Mini Project / POC ( Proof of Concept )
      Facebook-Hive POC
      Usages of Hadoop/Hive @ Facebook
      Static & dynamic partitioning
      UDF ( User defined functions )

HADOOP

A wide variety of companies and organizations use Hadoop for both research and production.

Hadoop consists of the Hadoop Common package, which provides filesystem and OS level abstractions, a MapReduce engine (either MapReduce/MR1 or YARN/MR2)[9] and the Hadoop Distributed File System (HDFS). The Hadoop Common package contains the necessary Java ARchive (JAR) files and scripts needed to start Hadoop. The package also provides source code, documentation and a contribution section that includes projects from the Hadoop Community.

For effective scheduling of work, every Hadoop-compatible file system should provide location awareness: the name of the rack (more precisely, of the network switch) where a worker node is. Hadoop applications can use this information to run work on the node where the data is, and, failing that, on the same rack/switch, reducing backbone traffic. HDFS uses this method when replicating data to try to keep different copies of the data on different racks. The goal is to reduce the impact of a rack power outage or switch failure, so that even if these events occur, the data may still be readable.

Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative

Hadoop distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the softwares ability to detect and handle failures at the application layer.

DEMOS VIDEOS

UPCOMING DEMOS

For updated Schedules please contact
calls