Phone: 214--514-2753

US : 1 214-514-2753

Email Us :

Select Training type

  • Class Room Training

    Starting From $1000

  • Instructor led live Online training

    Starting From $950

Classroom Training in Dallas, Texas

    17 Aug-13 Oct 8 Week Sun,Wed,Sat 09:00 AM-01:00 PM $1000


A turn-key training and placement program for aspiring Big Data IT proffessionals –

1. 8 weeks of Core Technology Training
8 weeks (65 hrs) of core competency development program to build foundation skills on following technologies

1. Hadoop Essentials
2. Apache Spark
3. Python
4. Big Data and AWS cloud ( S3, EMR , Redshift integration )

2. 3 weeks of Core Project and Certification
In parellel we will spend 3-4 weeks on intensive Project work, Lab exercises, Interview Mockup sessions. This will involve project scenario discussions, real life problem solving and one-on-one Interview sessions to make the students feel confident and market ready. During this exercise, all students will be encouraged to get certified in relevant Cloudera certification test.

3. Marketing and Placement
While our students develop core competency in Big Data Technologies and get certified in relevant subject area, our marketing team will focus on Resume preparation, Interview setup and Placement.


  • Module 1 – Introduction to Hadoop and its Ecosystem (Cloudera CDH-5), Map Reduce and HDFS
  •        Big Data, Factors constituting Big Data
           Hadoop and Hadoop Ecosystem
           Map Reduce -Concepts of Map, Reduce, Ordering, Concurrency, Shuffle, Reducing, Concurrency
           Hadoop Distributed File System (HDFS) Concepts and its Importance
           Deep Dive in Map Reduce – Execution Framework, Partitioner, Combiner, Data Types, Key pairs
           HDFS Deep Dive – Architecture, Data Replication, Name Node, Data Node, Data Flow
           Parallel Copying with DISTCP, Hadoop Archives
  • Module 2 – Hands on Exercises with Hadoop Component Installation
  •        Understanding Important configuration files, their Properties and Demon Threads
           Accessing HDFS from Command Line
           Map Reduce – Basic Exercises
           Understanding Hadoop Eco-system
           Introduction to Sqoop, use cases and Installation
           Introduction to Hive, use cases and Installation
           Introduction to Pig, use cases and Installation
           Introduction to Oozie, use cases and Installation
           Introduction to Flume, use cases and Installation
           Introduction to Yarn
  • Module 3 – Deep Dive in Map Reduce and Yarn
  •        How to develop Map Reduce Application, writing unit test
           Best Practices for developing and writing, Debugging Map Reduce applications
           Joining Data sets in Map Reduce
           Algorithms – Traversing Graph, etc
           Hadoop API’s
           Introduction to Hadoop Yarn
           Difference between Hadoop 1.0 and 2.0
           Hands on exercise – end to end PoC using Yarn or Hadoop 2.0
           Running Map Reduce Code on NASA weblogs and finding the pattern of user visit
  • Module 4 - Apache Sqoop and Apache Flume
  •        Moving data using Sqoop to HDFS and Hive
           Incremental update of data to HDFS
           Exporting data back to RDBMS (Mysql) from hadoop
           Apache Flume Introduction and Architecture
           Analyze Flume config file for data load into hdfs and hbase
           Hands on exercise on couple of use cases involving netstat and exec source types
  • Module 5 – Deep Dive in Pig
  •        Introduction to Pig
           What Is Pig?
           Pig’s Features
           Pig Use Cases
           Interacting with Pig
           Basic Data Analysis with Pig
           Pig Latin Syntax
           Loading Data
           Simple Data Types
           Field Definitions
           Data Output
           Viewing the Schema
           Filtering and Sorting Data
           Commonly-Used Functions
           Hands-On Exercise: Using Pig for ETL Processing
           Processing Complex Data with Pig
           Complex/Nested Data Types
           Grouping and Cogrouping
           Iterating Grouped Data
           Hands-On Exercise: Analyzing Data with Pig
  • Module 6 – Deep Dive in Hive
  •        Introduction to Hive
           What Is Hive?
           Hive Schema and Data Storage
           Comparing Hive to Traditional Databases
           Hive vs. Pig
           Hive Use Cases
           Interacting with Hive
           Relational Data Analysis with Hive
           Hive Databases and Tables
           Basic HiveQL Syntax
           Data Types
           Joining Data Sets
           Common Built-in Functions
           Hands-On Exercise: Running Hive Queries on the Shell, Scripts, and Hue
           Hive Data Management
           Hive Data Formats
           Creating Databases and Hive-Managed Tables
           Loading Data into Hive
           Altering Databases and Tables
           Self-Managed Tables
           Simplifying Queries with Views
           Storing Query Results
           Controlling Access to Data
           Hands-On Exercise: Data Management with Hive
           Hive Optimization
           Understanding Query Performance
           Indexing Data
  • Module 7 – Introduction to Hbase NoSQL db
  •        Introduction to HBase, Architecture, Map Reduce Integration, Different Client API – Features and Administration
           HBase data model, concept of Reginonserver, Region
           Integrating HBase with Hive and Pig
           Bulk loading data in HBase
           HBase Clients including Java, Mapreduce and REST API
  • Module 8 – Advance Mapreduce
  •        Delving Deeper Into the Hadoop API
           More Advanced Map Reduce Programming, Joining Data Sets in Map Reduce
           Graph Manipulation in Hadoop
  • Module 9 – Job and certification support
  •        Major Project, Hadoop Development, cloudera Certification Tips and Guidance and Mock Interview Preparation, Practical Development Tips and Techniques, certification preparation

    Advance Program

    Apache Spark
    1. Getting Started with Apache Spark
           Installing Spark from binaries
           Building the Spark source code with Maven
           Deploying on a cluster in standalone mode
           Deploying on a cluster with Mesos
           Deploying on a cluster with YARN

    2. Developing Applications with Spark
           Exploring the Spark shell
           Developing Spark applications in Eclipse with Maven
           Developing Spark applications in Eclipse with SBT

    3. External Data Sources
           Loading data from the local filesystem
           Loading data from HDFS
           Loading data from HDFS using a custom InputFormat
           Loading data from Amazon S3
           Loading data from Apache Cassandra
           Merge strategies in sbt-assembly
           Loading data from relational databases

    4. Spark SQL
           Understanding the Catalyst optimizer
           Logical plan optimization
           Physical planning
           Creating HiveContext
           Inferring schema using case classes
           Programmatically specifying the schema
           Loading and saving data using the Parquet format
           Loading and saving data using the JSON format
           Loading and saving data from relational databases
           Loading and saving data from an arbitrary source

    5. Spark Streaming
           Word count using Streaming
           Streaming Twitter data
           Streaming using Kafka

    7. Getting Started with Machine Learning Using MLlib
           Creating vectors
           Creating a labeled point
           Creating matrices
           Calculating summary statistics
           Calculating correlation
           Doing hypothesis testing
           Creating machine learning pipelines using ML
           Supervised Learning with MLlib – Regression
           Using linear regression
           Supervised Learning with MLlib – Classification
           Doing classification using logistic regression
           Doing binary classification using SVM
           Unsupervised Learning with MLlib
           Clustering using k-means

    8. Graph Processing Using GraphX
           Fundamental operations on graphs


    • +  Hadoop FAQ


    Certification adds immense professional value to the profile of a student. That is the reason, our course is designed to get you certified at the end of the course. The Training program will prepare the students for "Cloudera Certified Associate (CCA)" certification test by the end of the course. "CCA Spark and Hadoop Developer" is one of the top Certification test we aspire our students to be opting for.


    Participate in discussion on this product. Visit forum


    "Great program for hands-on study and training for hadoop developer. I have hadoop certified after this training."


    "All the concepts of Hadoop were covered in detail along with Assignments. The instructor was very knowledgeable being an industry expert. I was able to crack the Cloudera Hadoop developer certification after this training."

    Sharath Gopalakrishna

    "A detailed course for beginners.The instructor is very knowledgeable and has divided the training hours aptly into theory and hands-on sessions."

    Sweta Rathi

    "i enjoyed the class. Good instructor. Very hands on"


    "An awesome course for beginners. The instructor is very thorough, knowledgeable and has organized the course perfectly with excellent focus on concepts and hands on. The support and encouragement for taking the certification is very motivational."


    "One of the most valuable and interesting course I ever took. Highly recommended for those who are looking for big data courses. Ratikant is a wonderful instructor. His expertise and passion for the Big Data field is amazing. "


    "The course was very relevant to current hadoop implementations and will help to get into the job market with confidence. The hands-on practical training sessions were very good. Ratikanth (instructor) knows a lot about practical applications. Certified."

    Ajay Penmatcha

    "This was a very good course and I learnt a lot. Ratikant (the instructor)works very hard himself to prepare good course material for the students, so we could all learn a lot from him! He was very helpful throughout and after the course was completed. "


    "This is a great beginner course in big data and hadoop platform. Ratikant(Instructor) has strong experience in industry and is very knowledgeable in the field. Course is a good mixture of theory and hands on sessions. It was relevant to my current job."

    Saby Mandhata

    "Liked the course. Good coverage for the batch, both theory and practical. Would recommend for anyone as a starting point to Big Data career. Good luck !!"


    "The detailed curriculum really helped me to build up a good understanding and knowledge on Big Data and Hadoop ecosystem. Hands on session along with each class gave me the confidence to start working in real project. The instructor is very knowledgeable,"

    Sukanta Kundu

    Drop Us A Query:

    Course Features:

    • +  Class Room: 65 Hrs
    • Live instructor led sessions will cover the complete course curriculum including in-class Lab exercises. The class will also focus on extensive reference to real world use cases to help solidify the technology learning. Instructors will also conduct regular quiz sessions for each topic to internalize the understandings of the topic and prepare for the eventual certification path

    • +  Assignments: 40 Hrs
    • Most Hadoop modules (hdfs/hive/pig/sqoop/flume/hbase) will have in-class and offline assignments given by the instructor after the relevant topic. The students can post their assignments back in the forum and also can post questions on the forum, that will be answered by the instructor. The solutions to the assignments will be provided at the end, to validate the solutions

    • +  Project: 50 Hrs
    • Towards the end of the course (Once most modules are covered in class), 2 distinct projects will be given to the students to execute in their respective hadoop VMs. The requirements, scope and architecture of the solution will be provided as a baseline to work on the actual solution.

      If you opt for the extended program, the project will be executed in a real hadoop cluster assisted by Industry experts. The scope of projects in extended program include advance frameworks and in-depth implementation experience

    • +  Lifetime Access
    • Lifetime access to the forum and Trainer Material will be provided to all the registered students

    • +  24 x 7 Support
    • 24 x 7 online and phone support on assignments, questions for instructor is available to all registered students. Students can post their queries on the forum. Course Instructor typically gets back to the queries within 12-24 hrs with answers posted back in the forum

    • +  Get Certified
    • Hadoop developer course is designed to help students get certified with Cloudera Hadoop Developer certification. Regular quizzes in the class and resources for offline study will be provided to get certified right after the course completion. We do encourage all students to get certified within 15-20 days of course completion

    Create a new account

    Fill your email and password