Hadoop 2.0 Training in Pune, Hadoop Course in pune Idia

Hadoop 2.0 Introduction

This training will introduce attendees to the core concepts of Hadoop. Deep dive into the critical architecture paths of HDFS, MapReduce and HBase.Teach the basics of how to effectively write Pig and Hive scripts.Explain how to choose the correct use cases for Hadoop

Intended Audience:

Key Skills:

Advanced Map Reduce Concepts & Algorithms
Big Data & Hadoop Ecosystem
Hadoop Best Practices & Tip and Techniques Importing and exporting data
Hadoop Distributed File System – HDFS To use Map Reduce API and write common algorithms.
Best practices for developing and debugging map reduce programs
Explore a dataset of products, reviews and images
The attendees will learn:Managing and Monitoring Hadoop Cluster

Prerequisites:

The participants should have basic understanding or knowledge of java and linux.

Instructional Method:

This is an instructor led course which provides lecture topics and the practical application of Hadoop and the underlying technologies. It pictorially presents most concepts and there is a detailed case study that strings together the technologies, patterns and design

Hadoop Introduction

MapReduce

Distributing Data with HDFS

Interfaces
Hadoop Filesystems
The Design of HDFS
Using Hadoop Archives
- Limitations
Parallel Copying with distcp
- Keeping an HDFS Cluster Balanced
- Hadoop Archives
Data Flow
- Anatomy of a File Write
- Anatomy of a File Read
- Coherency Model
The Command-Line Interface - Basic Filesystem Operations
- The Java Interface
- Reading Data Using the FileSystem API
- Directories
- Deleting Data
- Reading Data from a Hadoop URL
- Writing Data

Understanding Hadoop I/O

File-Based Data Structures
- MapFile
- SequenceFile
Serialization
Implementing a Custom Writable
Serialization Frameworks
The Writable Interface
Writable Classes
Avro
Compression
- Codecs
- Using Compression in MapReduce
- Compression and Input Splits
Data Integrity
- ChecksumFileSystem
- LocalFileSystem
- Data Integrity in HDFS

Advanced MapReduce

Chaining MapReduce jobs
- Chaining preprocessing and postprocessing steps
- Chaining MapReduce jobs in a sequence
- Chaining MapReduce jobs with complex dependency
Creating a Bloom filter
- What does a Bloom filter do?
- Bloom filter in Hadoop version 0.20+
- Implementing a Bloom filter
Joining data from different sources

Writing Map-Reduce Applications

Map-Reduce Internals

Failures
- Failures in YARN
- Failures in Classic MapReduce
Anatomy of a MapReduce Job Run
- Classic MapReduce (MapReduce 1)
- YARN (MapReduce 2)
Shuffle and Sort
- The Reduce Side
- The Map Side
- Configuration Tuning
Task Execution
- Skipping Bad Records
- Output Committers
- The Task Execution Environment
- Speculative Execution
- Task JVM Reuse
Managing Hadoop
- Setting permissions
- Enabling trash
- Adding DataNodes
- Managing NameNode and Secondary NameNode
- Designing network layout and rack awareness
- Checking system’s health
- Managing quotas
- Setting up parameter values for practical use
- Removing DataNodes
- Recovering from a failed NameNode
- Map-Reduce Features
  - Sorting
  - Map-Reduce Library
  - Side Data Distribution
Map-Reduce Ecosystem
- Hive
  - Installing and configuring Hive
  - HiveQL in details
  - Example queries
  - Hive Sum-up
- Hbase
  - Intoduction
  - Clients
  - Concepts
  - Hbase vs RDBMS
- Installing Pig
- Running Pig
- Thinking like a Pig
  - Data flow language
  - User-defined functions
  - Data types
- Speaking Pig Latin
  - Execution optimization
  - Expressions and functions
  - Relational operators
  - Data types and schemas

Hadoop 2.0 Training