Hadoop

Hadoop

HADOOP ONLINE TRAINING COURSE CONTENT
Virtual box/VM Ware
a. Basics
b. Installations
c. Backups
d. Snapshots

Linux
a. Basics
b. Installations
c. Commands

Hadoop
a. Why Hadoop?
b. Scaling
c. Distributed Framework
d. Hadoop v/s RDBMS
e. Brief history of Hadoop

Setup Hadoop
a. Pseudo mode
b. Cluster mode
c. Ipv6
d. Ssh
e. Installation of java, Hadoop
f. Configurations of Hadoop
g. Hadoop Processes ( NN, SNN, JT, DN, TT)
h. Temporary directory
i. UI
j. Common errors when running Hadoop cluster, solutions

HDFS- Hadoop distributed File System
a. HDFS Design and Architecture
b. HDFS Concepts
c. Interacting HDFS using command line
d. Interacting HDFS using Java APIs
e. Dataflow
f. Blocks
g. Replica

Hadoop Processes
a. Name node
b. Secondary name node
c. Job tracker
d. Task tracker
e. Data node

Map Reduce
a. Developing Map Reduce Application
b. Phases in Map Reduce Framework
c. Map Reduce Input and Output Formats
d. Advanced Concepts
e. Sample Applications
f. Combiner
g. HAR

Joining data sets in Map reduce jobs
a. Map-side join
b. Reduce-Side join

Map reduce – customization
a. Custom Input format class
b. Hash Practitioner
c. Custom Practitioner
d. Sorting techniques
e. Custom Output format class

Hadoop Programming Languages:-
PIG
a. Introduction
b. Installation and Configuration
c. Interacting HDFS using PIG
d. Map Reduce Programs through PIG
e. PIG Commands
f. Loading, Filtering, Grouping….
g. Data types, Operators…..
h. Joins, Groups….
i. Sample programs in PIG

Hive
a. Basics
b. Installation and Configurations
c. Commanders….

NOSQL Databases Concepts
Specialties:
ETL tool (PDI ) ( Data Warehousing BI Tools)
a. Introduction
b. Creating RDBMS database
c. Establishing Connection between PDI to RDMS database
d. Creating data in hadoop
e. Establishing Connection between PDI to Hadoop data
f. Summarization

OVERVIEW HADOOP DEVELOPER
Introduction
The Motivation for Hadoop
• Problems with traditional large-scale systems
• Requirements for a new approach

Hadoop: Basic Concepts
• An Overview of Hadoop
• The Hadoop Distributed File System
• Hands-On Exercise
• How MapReduce Works
• Hands-On Exercise
• Anatomy of a Hadoop Cluster
• Other Hadoop Ecosystem Components

Writing a Map Reduce Program

• The Map Reduce Flow
• Examining a Sample Map Reduce Program
• Basic Map Reduce API Concepts
• The Driver Code
• The Mapper
• The Reducer
• Hadoop’s Streaming API
• Using Eclipse for Rapid Development
• Hands-on exercise
• The New MapReduce API

Delving Deeper Into The Hadoop API
• More about Tool Runner
• Testing with MRUnit
• Reducing Intermediate Data With Combiners
• The configure and close methods for Map/Reduce Setup and Teardown
• Writing Practitioners for Better Load Balancing
• Hands-On Exercise
• Directly Accessing HDFS
• Using the Distributed Cache
• Hands-On Exercise.

Common Map Reduce Algorithms
• Sorting and Searching
• Indexing
• Machine Learning With Mahout
• Term Frequency – Inverse Document Frequency
• Word Co-Occurrence
• Hands-On Exercise.

Usining HBase:
• What is HBase?
• HBase Architecture
• HBase API
• Managing large data sets with HBase
• Using HBase in Hadoop applications
• Hands-on exercise.

Using Hive and Pig
• Hive Basics
• Pig Basics
• Hands-on exercise.

Practical Development Tips and Techniques
• Debugging MapReduce Code
• Using LocalJobRunner Mode For Easier Debugging
• Retrieving Job Information with Counters
• Logging
• Split table File Formats
• Determining the Optimal Number of Reducers
• Map-Only MapReduce Jobs
• Hands-On Exercise.

More Advanced MapReduce Programming
• Custom Writable and WritableComparable
• Saving Binary Data using SequenceFiles and Avro Files
• Creating InputFormats and OutputFormats
• Hands-On Exercise

Joining Data Sets in MapReduce
• Map-Side Joins
• The Secondary Sort
• Reduce-Side Joins

Hadoop Ecosystem Overview
Oozie
HBase
Pig
Sqoop
Casandra
Chukwa
Mahout
Zoo Keeper
Flume

• Case Studies Discussions
• Certification Guidance
• Real Time Certification and
• interview Questions and Answers
• Resume Preparation
• Providing all Materials nd Links
• Real time Project Explanation and Practice