Big Data Online Training Course material is designed to train all the concepts of Big Data in a step by step process and covers from basic to advanced topics.

Big Data Course Content


 Big Data IntroductionPreview

 Distributed systemsPreview

 Big Data Use Cases

 Various Solutions

 Overview of Hadoop Ecosystem

 Spark Ecosystem Walkthrough


 Foundation & Environment

 Understanding the CloudxLabPreview

 Getting Started - Hands onPreview

 Hadoop & Spark Hands-on

  and Assessment

 Basics of Linux - Quick Hands-On

 Understanding Regular Expressions

  and Assessment

 Setting up VM (optional)


 ZooKeeper - Race ConditionPreview

 ZooKeeper - DeadlockPreview


  & Assessment

 How does election happen - Paxos Algorithm?

 Use cases

 When not to use

  & Assessment


 Why HDFS or Why not existing file systems?Preview

 HDFS - NameNode & DataNodesPreview


 Advance HDFS Concepts (HA, Federation)


 Hands-on with HDFS (Upload, Download, SetRep)

  & Assessment

 Data Locality (Rack Awareness)


 YARN - Why not existing tools?Preview

 YARN - Evolution from MapReduce Preview

 Resource Management: YARN Architecture

 Advance Concepts - Speculative Execution


 MapReduce Basics

 MapReduce - Understanding SortingPreview

 MapReduce - OverviewPreview


 Example  - Word Frequency Problem - Without MR

 Example  - Only Mapper - Image Resizing

 Example  - Word Frequency Problem

 Example  - Temperature Problem

 Example  - Multiple Reducer

 Example  - Java MapReduce Walkthrough


 MapReduce Advanced

 Writing MapReduce Code Using JavaPreview

 Building MapReduce project using Apache AntPreview

 Concept - Associative & Commutative


 Example  - Combiner

 Example  - Hadoop Streaming

 Example  - Adv Problem Solving - Anagrams

 Example  - Adv Problem Solving - Same DNA

 Example  - Adv Problem Solving - Similar DNA

 Example  - Joins - Voting

 Limitations of MapReduce


 Analyzing Data with Pig

 Pig - IntroductionPreview

 Pig - ModesPreview

 Getting Started

 Example - NYSE Stock Exchange

 Concept - Lazy Evaluation

 Processing Data with Hive

 Hive - IntroductionPreview

 Hive - Data TypesPreview

 Getting Started

 Loading Data in Hive (Tables)

 Example: Movielens Data Processing

 Advance Concepts: Views

 Connecting Tableau and HiveServer

 Connecting Microsoft Excel and HiveServer

 Project: Sentiment Analyses of Twitter Data

 Advanced - Partition Tables

 Understanding HCatalog & Impala


 NoSQL and HBase

 NoSQL - Scaling Out / UpPreview

 NoSQL - ACID Properties and RDBMS StoryPreview

 CAP Theorem

 HBase Architecture - Region Servers etc

 Hbase Data Model - Column Family Orientedness

 Getting Started - Create table, Adding Data

 Adv Example - Google Links Storage

 Concept - Bloom Filter

 Comparison of NOSQL Databases


 Importing Data with Sqoop and Flume, Oozie

 Sqoop - IntroductionPreview

 Sqoop Import - MySQL to HDFSPreview

 Exporting to MySQL from HDFS

 Concept - Unbounding Dataset Processing or Stream Processing

 Flume Overview: Agents - Source, Sink, Channel

 Example  - Data from Local network service into HDFS

 Example  - Extracting Twitter Data


 Example  - Creating workflow with Oozie


 Apache Spark ecosystem walkthrough

 Spark Introduction - Why Spark?Preview


 Scala Basics

 Scala - Quick Introduction - Access Scala on CloudxLabPreview

 Scala - Quick Introduction - Variables and MethodsPreview

 Getting Started: Interactive, Compilation, SBT

 Types, Variables & Values





 More Features

  and Assessment

 Spark Basics

 Apache Spark ecosystem walkthroughPreview

 Spark Introduction - Why Spark?Preview

 Using the Spark Shell on CloudxLab

 Example  - Performing Word Count

 Understanding Spark Cluster Modes on YARN

 RDDs (Resilient Distributed Datasets)

 General RDD Operations: Transformations & Actions

 RDD lineage

 RDD Persistence Overview

 Distributed Persistence

 Writing and Deploying Spark Applications

 Creating the SparkContext

 Building a Spark Application (Scala, Java, Python)

 The Spark Application Web UI

 Configuring Spark Properties

 Running Spark on Cluster

 RDD Partitions

 Executing Parallel Operations

 Stages and Tasks

 Common Patterns in Spark Data Processing

 Common Spark Use Cases

 Example  - Data Cleaning (Movielens)

 Example  - Understanding Spark Streaming

 Understanding Kafka

 Example  - Spark Streaming from Kafka

 Iterative Algorithms in Spark

 Project: Real-time analytics of orders in an e-commerce company

 Data Formats & Management

 InputFormat and InputSplit




 How to store many small files - SequenceFile?


 Protocol Buffers

 Comparing Compressions

 Understanding Row Oriented and Column Oriented Formats - RCFile?

 DataFrames and Spark SQL

 Spark SQL - IntroductionPreview

 Spark SQL - Dataframe IntroductionPreview

 Transforming and Querying DataFrames

 Saving DataFrames

 DataFrames and RDDs

 Comparing Spark SQL, Impala, and Hive-on-Spark

 Machine Learning with Spark

 Machine Learning IntroductionPreview

 Applications Of Machine LearningPreview

 MlLib Example: k-means

 SparkR Example


