Big Data Solution – Hadoop Development

Free Reservation button

Duration: 20 Hours

 

Introduction Big Data

All about Data!
Data Storage and Analysis
Comparison with Other Systems
Rational Database Management System
Grid Computing
Volunteer Computing
A Brief History of Hadoop
Compatibility

Installation single node Hadoop

Prerequisites Installation Configuration Standalone Mode

Pseudo distributed Mode Configuration SSH Formatting HDFS filesystem

Starting and stopping MapReduce
Fully Distributed Mode

Creating Eclipse Plugin for Hadoop-2.x.0

Contents
Download and install Eclipse
Install git

Download source code for Hadoop Plugin for Eclipse from git

Compile and create jar
Install the plugin to eclipse

Developing a MapReduce Application

The Configuration Combining Resources Variable Expansion

Setting Up the Development Environment Managing Configuration GenericOptionsParser, Tool, and ToolRunne

Writing a Unit Test with MRUnit
Mapper
Reducer

Running Locally on Test Data Running a Job in a Local Job Runner Testing the Driver

Running on a Cluster Packaging a Job Launching a Job

The MapReduce Web UI Retrieving the Results Debugging a Job

Hadoop Logs Remote Debugging Tuning a Job Profiling Tasks

MapReduce Workflows
Decomposing a Problem into MapReduce Jobs
JobControl
Apache Oozie

MapReduce Features

Counters
Built-in Counters
User-Defined Java Counters
User-Defined Streaming Counters

Sorting Preparation Partial Sort Total Sort Secondary Sort Joins

Map-Side Joins
Reduce-Side Joins
Side Data Distribution

Using the Job Configuration Distributed Cache MapReduce Library Classes

Setting Up a Hadoop Cluster

Cluster Specification
Network Topology
Cluster Setup and Installation
Installing Java

Creating a Hadoop User Installing Hadoop Testing the Installation SSH Configuration

Hadoop Configuration
Configuration Management
Environment Settings

Important Hadoop Daemon Properties Hadoop Daemon Addresses and Ports Other Hadoop Properties

User Account Creation
YARN Configuration

Important YARN Daemon Properties YARN Daemon Addresses and Ports Security

Kerberos and Hadoop
Delegation Tokens

Other Security Enhancements Benchmarking a Hadoop Cluster Hadoop Benchmarks

User Jobs
Hadoop in the Cloud
Apache Whirr

Administering Hadoop

HDFS
Persistent Data Structures

Safe Mode Audit Logging Tools Monitoring Logging Metrics

Java Management Extensions
Maintenance

Routine Administration Procedures Commissioning and Decommissioning Nodes Upgrades

Pig
Installing and Running Pig
Execution Types Running Pig Programs Grunt
Pig Latin Editors An Example Generating Examples
Comparison with Databases

Pig Latin Structure Statements Expressions Types Schemas Functions Macros

User-Defined Functions
A Filter UD An Eval UDF A Load UDF

Data Processing Operators Loading and Storing Data Filtering Data

Grouping and Joining Data
Sorting Data
Combining and Splitting Data
Pig in Practice
Parallelism
Parameter Substitution

Hive

Installing Hive The Hive Shell An Example Running Hive

Configuring Hive
Hive Services
The Metastore

Comparison with Traditional Databases Schema on Read Versus Schema on Write Updates, Transactions, and Indexes HiveQL

Data Types
Operators and Functions
Tables
Managed Tables and External Tables
Partitions and Buckets
Storage Formats

Importing Data Altering Tables Dropping Tables Querying Data

Sorting and Aggregating
MapReduce Scripts
Joins Subqueries Views
User-Defined Functions
Writing a UDF Writing a UDAF

HBase

HBasics Backdrop Concepts
Whirlwind Tour of the Data Model
Implementation
Installation Test Drive Clients
Java
Avro, REST, and Thrift
Example Schemas Loading Data Web Queries
HBase Versus RDBMS
Successful Service
HBase
Use Case: HBase at Streamy.com
Praxis Versions HDFS
UI Metrics
Schema Design
Counters
Bulk Load

R and Hadoop

Introduction R language
Introduction RHadoop Big Data solution
RHadoop
RHadoop data analysis
RHadoop machine learning

Python and Hadoop

Python Programming
Python and Hadoop
Hadoop - mrjob development
Spark
Introduction Spark
PySpark
Machine Learning

Advanced Administration and monitoring  

Multiple nodes

Add nodes
Decommission nodes
Recovering from Namenode failure

Monitoring cluster health using Ganglia - Pure Monitoring

Install Ambari - Manage and monitoring

Install Hue - Emphasis on use of hadoop environment and management

Clouderea Hadoop Certification

CCHA - Hadoop Administrator
CCHD – Hadoop Developer

Case Studies

Hadoop Usage at Last.fm
Last.fm: The Social Music Revolution
Hadoop at Last.fm

Generating Charts with Hadoop The Track Statistics Program Summary

 
 
Free Reservation button 
Last updated: | -- | Powered by WECAN.ca CMS