+91 95057 44455 | +1 214 447 7927

Hadoop Development Course

A complete practice oriented hands-on training on

Big Data & Hadoop Development

Designed to get you started on project right away!

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. It is designed to expand from single servers to thousands of machines, each providing computation and storage. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure, which minimizes the risk of catastrophic system failure, even if a significant number of nodes become out of action.

Course topics are carefully crafted by expert from industry to cover important topics what are required become skilled hadoop developer. Course aim is to prepare attendees to gain sufficient conceptual knowledge as well as practice covering practical/real time situations, so that they can start working in their company projects right after the training.

  • Complete hands-on effective training with individual focus
  • Flexible schedules – weekend or weekday; classroom or online or corporate
  • On Demand one-one and batch training at mutually convenient schedules
  • Practices and exercises for each topic
  • Interactive approach to understand concepts thoroughly
  • Speak to our coach before workshop
  • Discussion on concepts behind features of Hadoop
  • Guidance, Tips and techniques for certification examination
  • Quiz at the end of each major topic
  • Linux concepts and basic commands
  • Customized topics training for corporate
  • On Demand Services (need basis) – SQL basics, Core Java concepts, Resume preparation and guidance, Interview questions

With the number of Big Data career opportunities on the rise, Hadoop is fast becoming a must-know technology for the following professionals:

  • Software Developers and Architects
  • Analytics Professionals
  • Data Management Professionals
  • Business Intelligence Professionals
  • Project Managers
  • Aspiring Data Scientists
  • Anyone with a genuine interest in Big Data Analytics

Classroom Course:

60 hours regular course that will be delivered in multiple duration options

Online Instructor led Course:

60 hours regular course that will be delivered in multiple duration options

Corporate Course:

– 60 hours regular course with multiple duration options
– Flexible schedules based on customized course topics to meet the business needs of corporate.
– Dedicated one week course with complete hand holding, consulting approach

Course Curriculum

Course outline for complete full length course. Outline varies for customized courses or less duration foundation courses

1. Introduction
1.1 Big Data Introduction

* What is Big Data
* Data Analytics
* Bigdata Challenges
* Technologies supported by big data

1.2 Hadoop Introduction

* What is Hadoop?
* History of Hadoop
* Basic Concepts
* Future of Hadoop
* The Hadoop Distributed File System
* Anatomy of a Hadoop Cluster
* Breakthroughs of Hadoop
* Hadoop Distributions: Apache Hadoop, Cloudera Hadoop, Horton Networks Hadoop, MapR Hadoop

2. Hadoop Daemon Processes

* Name Node
* DataNode
* Secondary Name Node/High Availability
* Job Tracker/Resource Manager
* Task Tracker/Node Manager

3. HDFS (Hadoop Distributed File System)

* Blocks and Input Splits
* Data Replication
* Hadoop Rack Awareness
* Cluster Architecture and Block Placement
* Accessing HDFS: JAVA Approach, CLI Approach

4. Hadoop Installation Modes and HDFS

* Local Mode
* Pseudo-distributed Mode
* Fully distributed mode
* Pseudo Mode installation and configurations
* HDFS basic file operations

5. Hadoop Developer Tasks
5.1 Writing a MapReduce Program

* Basic API Concepts
* The Driver Class
* The Mapper Class
* The Reducer Class
* The Combiner Class
* The Partitioner Class
* Examining a Sample MapReduce Program with several examples
* Hadoop’s Streaming API
* Examining a Sample MapReduce Program with several examples
* Running your MapReduce program on Hadoop 1.0
* Running your MapReduce Program on Hadoop 2.0

5.2 Performing several hadoop jobs

* Sequence Files
* Record Reader
* Record Writer
* Role of Reporter
* Output Collector
* Processing XML files
* Counters
* Directly Accessing HDFS
* ToolRunner
* Using The Distributed Cache

5.3 Advanced MapReduce Programming

* A Recap of the MapReduce Flow
* The Secondary Sort
* Customized Input Formats and Output Formats
* Map-Side Joins
* Reduce-Side Joins

5.4 Practical Development Tips and Techniques

* Strategies for Debugging MapReduce Code
* Testing MapReduce Code Locally by Using LocalJobRunner
* Testing with MRUnit
* Writing and Viewing Log Files
* Retrieving Job Information with Counters
* Reusing Objects

5.5 Data Input and Output

* Creating Custom Writable and Writable-Comparable Implementations
* Saving Binary Data Using SequenceFile and Avro Data Files
* Issues to Consider When Using File Compression

5.6 Tuning for Performance in MapReduce

* Reducing network traffic with Combiner, Partitioner classes
* Reducing the amount of input data using compression
* Reusing the JVM
* Running with speculative execution
* Input Formatters
* Output Formatters
* Schedulers: FIFO schedulers, FAIR Schedulers, CAPACITY Schedulers

5.7 YARN

* What is YARN
* How YARN Works
* Advantages of YARN

6. Hadoop Ecosystems
6.1 PIG

* PIG concepts
* Install and configure PIG on a cluster
* PIG Vs MapReduce and SQL
* Write sample PIG Latin scripts
* Modes of running PIG
* Programming in Eclipse
* Running as Java program
* PIG Macros
* Accessing Hive from PIG

6.2 HIVE

* Hive concepts
* Hive architecture
* Installing and configuring HIVE
* Managed tables and external tables
* Partitioned tables
* Bucketed tables
* Complex data types
* Joins in HIVE
* Multiple ways of inserting data in HIVE tables
* CTAS, views, alter tables
* User defined functions in HIVE: Hive UDF, Hive UDAF, Hive UDTF


* SQOOP concepts
* SQOOP architecture
* Install and configure SQOOP
* Connecting to RDBMS
* Internal mechanism of import/export
* Import data from Oracle/Mysql to HIVE
* Export data to Oracle/Mysql
* Other SQOOP commands


* HBASE concepts
* ZOOKEEPER concepts
* HBASE and Region server architecture
* File storage architecture
* NoSQL vs SQL
* Defining Schema and basic operations: DDLs, DMLs
* HBASE use cases
* Access data stored in HBASE using clients like CLI, and Java
* Map Reduce client to access the HBASE data
* HBASE admin tasks


* OOZIE concepts
* OOZIE architecture: Workflow engine, Job coordinator
* Install and configuring OOZIE
* HPDL and XML for creating Workflows
* Nodes in OOZIE: Action nodes, Control nodes
* Accessing OOZIE jobs through CLI, and web console
* Develop sample workflows in OOZIE on various Hadoop distributions
— Run HDFS file operations
— Run MapReduce programs
— Run PIG scripts
— Run HIVE jobs
— Run SQOOP Imports/Exports


* FLUME Concepts
* FLUME architecture
* Installation and configurations
* Executing FLUME jobs


* What is Impala
* How Impala Works
* Imapla Vs Hive
* Impala’s shortcomings
* Impala Hands on


* ZOOKEEPER Concepts
* Zookeeper as a service
* Zookeeper in production

7. Integrations

* Mapreduce and HIVE integration
* Mapreduce and HBASE integration
* Java and HIVE integration
* HIVE – HBASE Integration

8. Spark

* Introduction to Scala
* Functional Programming in Scala
* Working with Spark RDDs

9. Hadoop Administrative Tasks:

Setup Hadoop cluster: Apache, Cloudera and VMware
* Install and configure Apache Hadoop on a multi node cluster
* Install and configure Cloudera Hadoop distribution in fully distributed mode
* Install and configure different ecosystems
* Basic Administrative tasks


24x7Coach.com, an education brand of Mahtia Business Solutions Pvt. Ltd., offers Classroom, Online/Virtual and Corporate training on Project Management, Technology and Soft Skills

+91 95057 44455 / +1 214 447 7927


24x7Coach.com © 2019. All rights reserved. 24x7Coach is an education brand of Mahtia Business Solutions Pvt. Ltd.

24x7Coach (Mahtia Business Solutions Pvt. Ltd.) is PMI® Registered Education Provider (R.E.P. Id 4619).

24x7Coach.com is an Authorized Training Partner of VMEdu, Inc.

PMI, the PMI Registered Education Provider logo, PMP, PgMP, CAPM, PMI-SP, PMI-RMP, PMI-ACP, PMI-PBA, and PMBOK are marks of the Project Management Institute, Inc..

Microsoft is a Registered Trade Mark of Microsoft Corporation in the United States and/or other countries.

SMC™ is a Registered Trade Mark of ScrumStudy.com / VMEdu, Inc.

PSM, and Professional Scrum Master are trademarks of scrum.org. Content on our site is not affiliated with nor endorsed by Scrum.org.

Images courtesy – pixabay.com.

Contact us to report any IP violation or plagiarism in our website. We will act promptly and accordingly.

24x7Coach.com © 2019.