Data/Analytics

Apache Hadoop Masterclass

 

Course Background

By attending you will gain a good understanding of the Hadoop technology stack, including MapReduce, HDFS, Hive, Pig, HBase – this course also provides an initial introduction to Mahout and other common utilities. This course is presented by Big Data Partnership and organised by UNICOM.

This informative class will help you better understand:
 What is Hadoop?

  • The essential components of a Hadoop-based data management solution
  • Pros and cons of implementing Hadoop
  • How does Hadoop fit into our existing environment and architecture?
  • The differences between various Hadoop distributions
  • Examine case studies of how big data is influencing society and businesses

Core topics covered

  • Why Hadoop?
  • The Hadoop Platform
  • The future of Hadoop

Course outline

Why Hadoop?

  • History and Background
  • Real-world use cases and case studies

The Hadoop Platform

  • Introduction to MapReduce and Hadoop File System (HDFS)
  • Data Warehousing with Hive
  • Parallel Processing with Pig
  • Data Mining with Mahout
  • Data storage with HBase
  • Common utilities – Sqoop, Flume, Hue, Scribe, Zookeeper, HCatalog
  • Hadoop Distributions – Apache Foundation, Cloudera, Hortonworks, MapR, IBM

The Future of Hadoop

  • YARN  - Next generation MapReduce
  • Other Programming paradigms on Hadoop

In-house option and related course:

This course is also available as a closed in-house course and can be customised to meet the specific requirements of your organisation. The following related course is also available in-house:

  • Big Data Concepts