Big Data Solution – Hadoop Development

Free Reservation button

Duration: 20 Hours


Introduction Big Data

All about Data!
Data Storage and Analysis
Comparison with Other Systems
Rational Database Management System
Grid Computing
Volunteer Computing
A Brief History of Hadoop

Installation single node Hadoop

Prerequisites Installation Configuration Standalone Mode

Pseudo distributed Mode Configuration SSH Formatting HDFS filesystem

Starting and stopping MapReduce
Fully Distributed Mode

Creating Eclipse Plugin for Hadoop-2.x.0

Download and install Eclipse
Install git

Download source code for Hadoop Plugin for Eclipse from git

Compile and create jar
Install the plugin to eclipse

Developing a MapReduce Application

The Configuration Combining Resources Variable Expansion

Setting Up the Development Environment Managing Configuration GenericOptionsParser, Tool, and ToolRunne

Writing a Unit Test with MRUnit

Running Locally on Test Data Running a Job in a Local Job Runner Testing the Driver

Running on a Cluster Packaging a Job Launching a Job

The MapReduce Web UI Retrieving the Results Debugging a Job

Hadoop Logs Remote Debugging Tuning a Job Profiling Tasks

MapReduce Workflows
Decomposing a Problem into MapReduce Jobs
Apache Oozie

MapReduce Features

Built-in Counters
User-Defined Java Counters
User-Defined Streaming Counters

Sorting Preparation Partial Sort Total Sort Secondary Sort Joins

Map-Side Joins
Reduce-Side Joins
Side Data Distribution

Using the Job Configuration Distributed Cache MapReduce Library Classes

Setting Up a Hadoop Cluster

Cluster Specification
Network Topology
Cluster Setup and Installation
Installing Java

Creating a Hadoop User Installing Hadoop Testing the Installation SSH Configuration

Hadoop Configuration
Configuration Management
Environment Settings

Important Hadoop Daemon Properties Hadoop Daemon Addresses and Ports Other Hadoop Properties

User Account Creation
YARN Configuration

Important YARN Daemon Properties YARN Daemon Addresses and Ports Security

Kerberos and Hadoop
Delegation Tokens

Other Security Enhancements Benchmarking a Hadoop Cluster Hadoop Benchmarks

User Jobs
Hadoop in the Cloud
Apache Whirr

Administering Hadoop

Persistent Data Structures

Safe Mode Audit Logging Tools Monitoring Logging Metrics

Java Management Extensions

Routine Administration Procedures Commissioning and Decommissioning Nodes Upgrades

Installing and Running Pig
Execution Types Running Pig Programs Grunt
Pig Latin Editors An Example Generating Examples
Comparison with Databases

Pig Latin Structure Statements Expressions Types Schemas Functions Macros

User-Defined Functions
A Filter UD An Eval UDF A Load UDF

Data Processing Operators Loading and Storing Data Filtering Data

Grouping and Joining Data
Sorting Data
Combining and Splitting Data
Pig in Practice
Parameter Substitution


Installing Hive The Hive Shell An Example Running Hive

Configuring Hive
Hive Services
The Metastore

Comparison with Traditional Databases Schema on Read Versus Schema on Write Updates, Transactions, and Indexes HiveQL

Data Types
Operators and Functions
Managed Tables and External Tables
Partitions and Buckets
Storage Formats

Importing Data Altering Tables Dropping Tables Querying Data

Sorting and Aggregating
MapReduce Scripts
Joins Subqueries Views
User-Defined Functions
Writing a UDF Writing a UDAF


HBasics Backdrop Concepts
Whirlwind Tour of the Data Model
Installation Test Drive Clients
Avro, REST, and Thrift
Example Schemas Loading Data Web Queries
HBase Versus RDBMS
Successful Service
Use Case: HBase at
Praxis Versions HDFS
UI Metrics
Schema Design
Bulk Load

R and Hadoop

Introduction R language
Introduction RHadoop Big Data solution
RHadoop data analysis
RHadoop machine learning

Python and Hadoop

Python Programming
Python and Hadoop
Hadoop - mrjob development
Introduction Spark
Machine Learning

Advanced Administration and monitoring  

Multiple nodes

Add nodes
Decommission nodes
Recovering from Namenode failure

Monitoring cluster health using Ganglia - Pure Monitoring

Install Ambari - Manage and monitoring

Install Hue - Emphasis on use of hadoop environment and management

Clouderea Hadoop Certification

CCHA - Hadoop Administrator
CCHD – Hadoop Developer

Case Studies

Hadoop Usage at The Social Music Revolution
Hadoop at

Generating Charts with Hadoop The Track Statistics Program Summary

Free Reservation button 
Last updated: | -- | Powered by CMS