Big Data and Hadoop training

Big Data and Hadoop training course is designed to provide knowledge and skills to become a successful Hadoop Developer. In-depth knowledge of concepts such as Hadoop Distributed File System, Hadoop Cluster, Map-Reduce, Hbase Zookeeper etc. will be covered in the course.

Course Objectives

After the completion of the Big Data and Hadoop Course at Spectramind, you should be able to:

Master the concepts of Hadoop Distributed File System 

Setup a Hadoop Cluster 

Write MapReduce Code in Java 

Perform Data Analytics using Pig and Hive 

Understand Data Loading Techniques using Sqoop and Flume 

Implement HBase, MapReduce Integration, Advanced Usage and Advanced Indexing 

Have a good understanding of ZooKeeper service 

Use Apache Oozie to Schedule and Manage Hadoop Jobs 

Implement best Practices for Hadoop Development and Debugging 

Develop a working Hadoop Architecture 

Work on a Real Life Project on Big Data Analytics and gain Hands on Project Experience

Who should go for this course?

This course is designed for professionals aspiring to make a career in Big Data Analytics using Hadoop Framework. Software Professionals, Analytics Professionals, ETL developers, Project Managers, Testing Professionals are the key beneficiaries of this course. Other professionals who are looking forward to acquire a solid foundation of Hadoop Architecture can also opt for this course.


Some of the prerequisites for learning Hadoop include hands-on experience in Core Java and good analytical skills to grasp and apply the concepts in Hadoop. We provide a complimentary Course "Java Essentials for Hadoop" to all the participants who enroll for the Hadoop Training. This course helps you brush up your Java Skills needed to write Map Reduce programs.

Project Work

Towards the end of the 2Days schedule you will be working on a live project which will be a large dataset and you will be using PIG, HIVE, HBase and MapReduce to perform Big Data analytics. The final project is a real life business case on some open data set. There is not one but a large number of datasets which are a part of the Big Data and Hadoop Program.

Here are some of the data sets on which you may work as a part of the project work:

Twitter Data Analysis : Twitter data analysis is used to understand the hottest trends by dwelling into the twitter data. Using flume data is fetched from twitter to Hadoop in JSON format. Using JSON-serde twitter data is read and fed into HIVE tables so that we can do different analysis using HIVE queries. For eg: Top 10 popular tweets etc.

Stack Exchange Ranking and Percentile data-set : Stack Exchange is a place where you will find enormous data from multiple websites of Stack Group (like: stack overflow) which is open sourced. The place is a gold mine for people who wants to come up with several POC’s and are searching for suitable data-sets. In there you may query out the data you are interested in which will contain more than 50,000 odd records. For eg: You can download StackOverflow Rank and Percentile data and find out the top 10 rankers.

Loan Dataset : The project is designed to find the good and bad URL links based on the reviews given by the users. The primary data will be highly unstructured. Using MR jobs the data will be transformed into structured form and then pumped to HIVE tables. Using Hive queries we can query out the information very easily. In the phase two we will feed another dataset which contains the corresponding cached web pages of the URL’s into HBASE. Finally the entire project is showcased into a UI where you can check the ranking of the URL and view the cached page.

Data -sets by Government: These Data sets could be like Worker Population Ratio (per 1000) for persons of age (15-59) years according to the current weekly status approach for each state/UT.

Machine Learning Dataset like Badges datasets : Such dataset is for system to encode names, for example +/- label followed by a person’s name.

NYC Data Set: NYC Data Set contains the day to day records of all the stocks. It will provide you with the information like opening rate, closing rate, etc for individual stocks. Hence, this data is highly valuable for people you have to make decision based on the market trends. One of the analysis which is very popular and can be done on this data set is to find out the Simple Moving Average which helps them to find the crossover action.

Weather Dataset : It has all the details of weather over a period of time using which you may find out the highest, lowest or average temperature.

In addition, you can choose your own dataset and create a project around that as well.

Why Learn Hadoop?

BiG Data! A Worldwide Problem?

According to Wikipedia, “Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.” In simpler terms, Big Data is a term given to large volumes of data that organizations store and process. However, It is becoming very difficult for companies to store, retrieve and process the ever-increasing data. If any company gets hold on managing its data well, nothing can stop it from becoming the next BIG success!

The problem lies in the use of traditional systems to store enormous data. Though these systems were a success a few years ago, with increasing amount and complexity of data, these are soon becoming obsolete. The good news is - Hadoop, which is not less than a panacea for all those companies working with BIG DATA in a variety of applications has become an integral part for storing, handling, evaluating and retrieving hundreds or even petabytes of data.

Apache Hadoop! A Solution for Big Data!

Hadoop is an open source software framework that supports data-intensive distributed applications. Hadoop is licensed under the Apache v2 license. It is therefore generally known as Apache Hadoop. Hadoop has been developed, based on a paper originally written by Google on MapReduce system and applies concepts of functional programming. Hadoop is written in the Java programming language and is the highest-level Apache project being constructed and used by a global community of contributors. Hadoop was developed by Doug Cutting and Michael J. Cafarella. And just don’t overlook the charming yellow elephant you see, which is basically named after Doug’s son’s toy elephant!

Some of the top companies using Hadoop:

The importance of Hadoop is evident from the fact that there are many global MNCs that are using Hadoop and consider it as an integral part of their functioning, such as companies like Yahoo and Facebook! On February 19, 2008, Yahoo! Inc. established the world's largest Hadoop production application. The Yahoo! Search Webmap is a Hadoop application that runs on over 10,000 core Linux cluster and generates data that is now widely used in every Yahoo! Web search query.

Facebook, a $5.1 billion company has over 1 billion active users in 2012, according to Wikipedia. Storing and managing data of such magnitude could have been a problem, even for a company like Facebook. But thanks to Apache Hadoop! Facebook uses Hadoop to keep track of each and every profile it has on it, as well as all the data related to them like their images, posts, comments, videos, etc.

Opportunities for Hadoopers!

Opportunities for Hadoopers are infinite - from a Hadoop Developer, to a Hadoop Tester or a Hadoop Architect, and so on. If cracking and managing BIG Data is your passion in life, then think no more and Join Spectramind’s Hadoop Online course and carve a niche for yourself! Happy Hadooping!

Module 1
Introduction To Hadoop Distributed File Sytem (HDFS)
Learning Objectives - In this module, you will understand what is HDFS, why it is required for running Map-Reduce and how it differs from other distributed file systems. You will also get a basic idea how data gets fetched and written on HDFS.
Topics - Design of HDFS, HDFS Concepts, Command Line Interface, Hadoop File Systems, Java Interface, Data Flow (Anatomy of a File Read, Anatomy of a File Write, Coherency Model), Parallel Copying with DISTCP, Hadoop Archives.

Module 2
Understanding Pseudo Cluster Environment
Learning Objectives - After this module, you will understand the different components of a Hadoop Pseudo Cluster and about different configuration files to be used in the cluster setup.
Topics - Cluster Specification, Hadoop Configuration (Configuration Management, Environment Settings, Important Hadoop Daemon Properties, Hadoop Daemon Addresses and Ports, Other Hadoop Properties), Basic Linux and HDFS commands.

Module 3
Understanding - Map-Reduce Basics and Map-Reduce Types and Formats
Learning Objectives - After this module, you will get an idea of how Map-Reduce framework works and why Map-Reduce is tightly coupled with HDFS. Also, you will learn what the different types of Input and Output formats are and why are they required.
Topics - Hadoop Data Types, Functional - Concept of Mappers, Functional - Concept of Reducers, The Execution Framework, Concept of Partitioners, Functional - Concept of Combiners, Distributed File System, Hadoop Cluster Architecture, MapReduce Types, Input Formats (Input Splits and Records, Text Input, Binary Input, Multiple Inputs), OutPut Formats (TextOutput, BinaryOutPut, Multiple Output).

Module 4
Learning Objectives - In this module you will learn what is Pig, in which type of use case we can use Pig, how Pig is tightly coupled with Map-Reduce, along with an example.
Topics - Installing and Running Pig, Grunt, Pig's Data Model, Pig Latin, Developing & Testing Pig Latin Scripts, Writing Evaluation, Filter, Load & Store Functions.

Module 5
Learning Objectives - This module will provide you with a clear understanding of what is HIVE, how you can load data into HIVE and query data from Hive and so on.
Topics - Hive Architecture, Running Hive, Comparison with Traditional Database (Schema on Read Versus Schema on Write, Updates, Transactions and Indexes), HiveQL (Data Types, Operators and Functions), Tables (Managed Tables and External Tables, Partitions and Buckets, Storage Formats, Importing Data, Altering Tables, Dropping Tables), Querying Data (Sorting And Aggregating, Map Reduce Scripts, Joins & Subqueries & Views, Map and Reduce site Join to optimize Query), User Defined Functions, Appending Data into existing Hive Table, Custom Map/Reduce in Hive.

Module 6
Learning Objectives - In this module you will acquire in-depth knowledge of what is Hbase, how you can load data into Hbase and query data from Hbase using client and so on.
Topics - Introduction, Client API - Basics, Client API - Advanced Features, Client API - Administrative Features, Available Client, Architecture, MapReduce Integration, Advanced Usage, Advance Indexing.

Module 7
Learning Objectives - At the end of this module, you will learn about what a Zookeeper is all about, how it helps in monitoring a cluster and why Hbase uses Zookeeper. You will know what Sqoop is, how you can do import and export in/from HDFS and what is the internal architecture of Sqoop.
ZOOKEEPER Topics - The Zookeeper Service (Data Modal, Operations, Implementation, Consistency, Sessions, States), Building Applications with Zookeeper (Zookeeper in Production).
SQOOP Topics - Database Imports, Working with Imported Data, Importing Large Objects, Performing Exports, Exports - A Deeper Look.

Module 8
Learning Objectives - This week we will work on a real life project. We will discuss about different data sets and specifications of the project.
Some of the data sets on which you may work as a part of the project work:
Twitter Data Analysis : Download twitter data and the put it in HBase and use Pig, Hive and MapReduce to garner the popularity of some hashtags Stack Exchange Ranking and Percentile data-set : It is dataset from stackOver flow, in which there ranking and percentile details of Users Loan Dataset : It deals with the users who has taken along with their Emi details, time period etc Data -sets by Government: like Worker Populatio Ratio (per 1000) for persons of age (15-59) years according to the current weekly status approach for each state/UT Machine Learning Dataset like Badges datasets : The dataset is for system toencode names , for ed +/- label followed by a persons name NYC Data Set: New York Stock Exchange data Weather Dataset : it has all the details of weather over a period od time using which you may find out the hottest or coldest or average temperature.
In addition, you can choose your own dataset and create a project around that as well.

 Registration Details

Course Fee:
Single Nomination:
Training Fees : INR 15,000 (Inclusive of +12.36% service tax).

Avail Special Discounts Avail Special Discounts Avail Special Discounts Avail Special Discounts
5% Discount for Early Bird Registrations (15 Days in advance to the program date) 5% Discount on Task force of 4 to 7 10% Discount on task Force of 8 and above 10% discount applicable to BA/PMP/CSBA/IREB/CSTE / CSQA/CISSP/CFPS/CSPM/CAPM /CISA/ Qualified Professionals, IIBA/PMI /SEG /CII/SPIN /CSI and NASSCOM Members

NOTE: Only one discount option is applicable at any time

 Course Dates, Venue & Timings:

Sl.No. State City Batch1-Date Batch1-Date Batch2-Date Batch2-Date Batch3-Date Batch3-Date Batch4-Date Batch4-Date Venue Contact
01 AP Hyderabad 14-Oct’13 15-Oct’13 4-Nov’13 8-Nov’13 2-Dec’13 6-Dec’13 6-JAN’14 10-JAN’14 Flat 617,Annapurna block, Aditya enclave, Ameerpet, Hyderabad-500016 Jason-91-40-64568797
02 AP Hyderabad 14-Oct’13 15-Oct’13 4-Nov’13 8-Nov’13 2-Dec’13 6-Dec’13 6-JAN’14 10-JAN’14   Software Units layout , inside side of Raheja Mind Space , back of  Inorbit mall ,  Hightech City ,  Hyderabad-500081 Jason-91-40-64568797
03 Karnataka Bangalore - - 2-Nov’13 3-Nov'13 7-Dec’13 8-Dec’13 4-JAN’14 5-JAN’14 DBS center , Cunningham road , Bangalore Sundar Raju
04 Tamilnadu  Chennai - - 9-Nov’13 10-Nov’13 14-Dec’13 15-Dec’13 11-JAN’14 12-JAN’14 CHENNAI, CitiCentre , Level 6, 10/11 Dr.Radhakrishna Salai,Chennai,Tamil Nadu,600 004,India Mr.Balaji : 0 87545 11800
05 Maharashtra Mumbai 19-oct'13 20-Oct'13 9-Nov’13 10-Nov’13 14-Dec’13 15-Dec’13 18-JAN’14 19-JAN’14 DBS Heritage,Prescot Road,Opp. Cathedral Sr. School,Fort, Mumbai 400001. DBS Heritage (From Airport instruct the car / cab driver to drive to Fort, Fashion Street. It’s near Siddharth College, Budha Bhavan. Also there are schools like J. P. Pettit School & Cathedral Sr. School Mr.Vasudev
06 Delhi Delhi/Gurgaon/Noida 26-Oct'13 27-Oct'13 23-Nov'13 24-Nov'13 28-Dec’13 29-Dec’13 25-JAN’14 26-JAN’14 Paharpur Business Centre, 21, Nehru Place Greens, New Delhi - 110019   Arun
07 Maharashtra Pune - 9-Nov’13 10-Nov’13 9-Dec’13 10-Dec’13 9-JAN’14 10-JAN’14 Panchasheel tech park,Yerwada, Pune Mr.Manish
08 Westbengal Kolkata 19-Oct’13 20-Oct’13 19-Nov’13 20-Nov’13 19-Dec’13 20-Dec’13 20-JAN’14 21-JAN’14   Constantia, 6/F,Constantia, Dr. U. N. Brahmachari Marg, Kolkata Mr.Hamid : 9088159989
09 Gujarat    Ahmedabad 23-OCT’13 24-OCT’13 23-NOV'13 24-NOV'13 23-DEC'13 24-DEC'13 24-JAN’13 25-JAN’13  Aakruti Complex,Nr. Stadium Cross Road, Navrangpura,Ahmedabad-380009, Gujarat, INDIA Mr.Alok
10 AP Vishakhapatnam - - 4-Nov’13 8-Nov’13 2-Dec’13 6-Dec’13 6-JAN’14 10-JAN’14 DBS center Vijay -94400 89341
11 Bihar Patna - - 4-Nov’13 8-Nov’13 2-Dec’13 6-Dec’13 6-JAN’14 10-JAN’14  DBS Center, Patna Jason
12 Chhattisgarh Raipur - - 4-Nov’13 8-Nov’13 2-Dec’13 6-Dec’13 6-JAN’14 10-JAN’14  DBS center ,Raipur Jason
13 Madhya Pradesh Indore - - 4-Nov’13 8-Nov’13 2-Dec’13 6-Dec’13 6-JAN’14 10-JAN’14 Indore Arun :9755598333
14 Haryana Chandigarh - - 4-Nov’13 8-Nov’13 2-Dec’13 6-Dec’13 6-JAN’14 10-JAN’14 Spectramind, 1708/1,  Sector – 39-B, Chandigarh- 160 036 Kavita
15 Kerala     Cochin   - - 4-Nov’13 8-Nov’13 2-Dec’13 6-Dec’13 6-JAN’14 10-JAN’14 ThomasMount ,ICTA Building,Changampuzha Nagar P.O.,Cochin- 682 033 Mr.Manoj: 9995881093
16 Kerala Trivandrum - - 4-Nov’13 8-Nov’13 2-Dec’13 6-Dec’13 6-JAN’14 10-JAN’14  Hotel Classic Avenue, Thampanoor, Trivandrum, Kerala. Mr.Manoj
17 Orissa Bhubaneshwar - - 4-Nov’13 8-Nov’13 2-Dec’13 6-Dec’13 6-JAN’14 10-JAN’14 Vani Vihar, Bhubaneshwar Mr. Satya Deep : 95811 98770
18 Rajasthan Jaipur - - 4-Nov’13 8-Nov’13 2-Dec’13 6-Dec’13 6-JAN’14 10-JAN’14  DBS center,Jaipur Mr.Manish
19 Tamilnadu    Coimbatore - - 4-Nov’13 8-Nov’13 2-Dec’13 6-Dec’13 6-JAN’14 10-JAN’14  DBS Center Mr.Balaji
20 Uttar Pradesh Lucknow - - 4-Nov’13 8-Nov’13 2-Dec’13 6-Dec’13 6-JAN’14 10-JAN’14  DBS center,Lucknow Mr.Sandeep

Call/SMS :vijay :0-9440089341

Pls send us your query , will answer back within 24Hrs: Thanks in advance for contacting us

First name, last name
Leave blank if none
Your message
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License