Hadoop Online Training

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

  • 1. Introduction


    1.1 Big Data Introduction
    * What is Big Data
    * Data Analytics
    * Bigdata Challenges
    * Technologies supported by big data

    1.2 Hadoop Introduction
    * What is Hadoop?
    * History of Hadoop
    * Basic Concepts
    * Future of Hadoop
    * The Hadoop Distributed File System
    * Anatomy of a Hadoop Cluster
    * Breakthroughs of Hadoop
    * Hadoop Distributions:
    - Apache Hadoop
    - Cloudera Hadoop
    - Horton Networks Hadoop
    - MapR Hadoop

  • 2. Hadoop Daemon Processes


    * Name Node
    * DataNode
    * Secondary Name Node
    * Job Tracker
    * Task Tracker

  • 3. HDFS (Hadoop Distributed File System)


    * Blocks and Input Splits
    * Data Replication
    * Hadoop Rack Awareness
    * Cluster Architecture and Block Placement
    * Accessing HDFS
    - JAVA Approach
    - CLI Approach

  • 4. Hadoop Installation Modes and HDFS


    * Local Mode
    * Pseudo-distributed Mode
    * Fully distributed mode
    * Pseudo Mode installation and configurations
    * HDFS basic file operations

  • 5. Hadoop Developer Tasks


    5.1 Writing a MapReduce Program

    * Basic API Concepts
    * The Driver Class
    * The Mapper Class
    * The Reducer Class
    * The Combiner Class
    * The Partitioner Class
    * Examining a Sample MapReduce Program with several examples
    * Hadoop's Streaming API

    5.2 Performing several hadoop jobs

    * Sequence Files
    * Record Reader
    * Record Writer
    * Role of Reporter
    * Output Collector
    * Processing XML files
    * Counters
    * Directly Accessing HDFS
    * ToolRunner
    * Using The Distributed Cache

    5.3 Advanced MapReduce Programming

    * A Recap of the MapReduce Flow
    * The Secondary Sort
    * Customized Input Formats and Output Formats
    * Map-Side Joins
    * Reduce-Side Joins

    5.4 Monitoring and debugging on a Production Cluster

    * Counters
    * Skipping Bad Records
    * Rerunning failed tasks with Isolation Runner

    5.5 Tuning for Performance in MapReduce

    * Reducing network traffic with Combiner, Partitioner classes
    * Reducing the amount of input data using compression
    * Reusing the JVM
    * Running with speculative execution
    * Input Formatters
    * Output Formatters
    * Schedulers
    - FIFO schedulers
    - FAIR Schedulers
    - CAPACITY Schedulers

    5.6 Debugging MapReduce Programs

    * Testing with MRUnit
    * Logging
    * Other Debugging Strategies

  • 6. Hadoop Ecosystems


    6.1 PIG
    * PIG concepts
    * Install and configure PIG on a cluster
    * PIG Vs MapReduce and SQL
    * PIG Vs HIVE
    * Write sample PIG Latin scripts
    * Modes of running PIG
    * Programming in Eclipse
    * Running as Java program
    * PIG UDFs
    * PIG Macros

    6.2 HIVE

    * Hive concepts
    * Hive architecture
    * Installing and configuring HIVE
    * Managed tables and external tables
    * Partitioned tables
    * Bucketed tables
    * Joins in HIVE
    * Multiple ways of inserting data in HIVE tables
    * CTAS, views, alter tables
    * User defined functions in HIVE
    - Hive UDF
    - Hive UDAF
    - Hive UDTF

    6.3 SQOOP

    * SQOOP concepts
    * SQOOP architecture
    * Install and configure SQOOP
    * Connecting to RDBMS
    * Internal mechanism of import/export
    * Import data from Oracle/Mysql to HIVE
    * Export data to Oracle/Mysql
    * Other SQOOP commands

    6.4 HBASE

    * HBASE concepts
    * ZOOKEEPER concepts
    * HBASE and Region server architecture
    * File storage architecture
    * NoSQL vs SQL
    * Defining Schema and basic operations
    - DDLs
    - DMLs
    * HBASE use cases
    * Access data stored in HBASE using clients like CLI, and Java
    * Map Reduce client to access the HBASE data
    * HBASE admin tasks

    6.5 OOZIE

    * OOZIE concepts
    * OOZIE architecture
    - Workflow engine
    - Job coordinator
    * Install and configuring OOZIE
    * HPDL and XML for creating Workflows
    * Nodes in OOZIE
    - Action nodes
    - Control nodes
    * Accessing OOZIE jobs through CLI, and web console
    * Develop sample workflows in OOZIE on various Hadoop distributions
    - Run HDFS file operations
    - Run MapReduce programs
    - Run PIG scripts
    - Run HIVE jobs
    - Run SQOOP Imports/Exports
    6.6 FLUME

    * FLUME Concepts
    * FLUME architecture
    * Installation and configurations
    * Executing FLUME jobs

  • 7. Integrations


    * Mapreduce and HIVE integration
    * Mapreduce and HBASE integration
    * Java and HIVE integration
    * HIVE - HBASE Integration

  • 8. Hadoop Administrative Tasks


    Setup Hadoop cluster: Apache, Cloudera and VMware
    * Install and configure Apache Hadoop on a multi node cluster
    * Install and configure Cloudera Hadoop distribution in fully distributed mode
    * Install and configure different ecosystems
    * Monitoring the cluster
    * Name Node in Safe mode
    * Meta Data Backup
    * Integrating Kerberos security in hadoop
    * Ganglia and Nagios – Cluster monitoring

 

Course Deliverables:

* Workshop style coaching
* Interactive approach
* Course material
* Hands on practice exercises for each topic
* Quiz at the end of each major topic
* Tips and techniques on Cloudera Certification Examination
* Linux concepts and basic commands
* On Demand Services
* SQL basics on need basis
* Core Java concepts on need basis
* Resume preparation and guidance
* Interview questions

Know (Y)Our Coach

Mr. Nagaraju has over 20+ years of diversified IT experience in the areas of Project/Program Management, Service Management, Application Development and Maintenance, ETL & Datawarehousing and Education/Training.

He is seasoned trainer, mentor and consultant in Hadoop, and Oracle technologies. He has got around 6 years of teaching experience that includes 3 years on Hadoop Technology. He impacted 1000+ professionals on Hadoop through both class room and online trainings. He provides mentoring and consulting services for individuals and companies on Hadoop. He also conducted many workshops on Project Management topics such as PMP certification and Microsoft Office Project.

Nagaraju is specialized in conducting corporate trainings on hadoop within and outside India and he has delivered hadoop training for various IT giants such as IBM, HP, CTS, Capgemini, Unicel to name a few.

He handled large, medium and small size mission critical projects on various domains like Public Transportation, Finance, Banking/Credit card processing, e-Commerce, Content Management and HealthCare & Health Insurance. He served world class clients and delivered solutions on Hadoop, Java, PHP, ASP, Oracle, SQL Server, Sybase, Mysql, ETL and Datawarehousing.

Mr. Nagaraju worked in USA for 9 years and served for fortune 500 companies.

* 20+ years of diversified IT experience in the areas of Project/Program Management, Service Management, Application Development and Maintenance, ETL & Datawarehousing and Education/Training
* 6+ years of experience in conducting training on various topics.
* 3+ years of experience in conducting training on Hadoop
* 1000+ professionals were impacted by our coach on Hadoop.
* 4.5+ on the scale of 5, is the feedback received for the workshops facilitated by our coach.
* 100% of participants answered 'Yes' to recommend the workshops conducted by our coach.

* Project Management Professional (PMP®) from Project Management Institute
* Microsoft Certified Technology Specialist in 'MS Projects 2007, managing projects'
* ITIL V3 Foundation certified by EXIN
* Masters Degree in Computer Science from Columbus University, USA

 

© Copyright 2017 www.24x7Coach.com. All rights reserved. 24x7Coach.com is a venture of Mahtia Business Solutions Pvt. Ltd.

24x7Coach.com is Authorized Training Partner of VMEdu, Inc. (ScrumStudy.com, SMStudy.com)

Registered Office: Mahtia Business Solutions Pvt. Ltd. West Marredpally, Secunderabad, Telangana, India
Classes held: Based on convenience - S R Nagar, Madhapur, Kukatpally

Microsoft is a Registered Trade Mark of Microsoft Corporation in the United States and/or other countries.
PSM, and Professional Scrum Master are trademarks of scrum.org. Content on our site is not affiliated with nor endorsed by Scrum.org.
Please report any IP violation or plagarism to info@24x7coach.com and we will act promptly and accordingly.

PayPal Acceptance Mark