Data Science With R Training In OMR – Sholinganallur – Chennai

Automation Minds located in Adyar and OMR provides Data Science with R training in Chennai. Become a certified Data Scientist by Learning R, SQL and Excel. Learn analytics from data manipulation to predictive modeling – using R. Get certified in 6 weeks.

In this R Certification Training, you’ll become an expert in analytics techniques using the R data science tool. R Training institute in Chennai for Data Science offers a comprehensive learning foundation that you can build your analytics career on. Become an expert in data analytics using the R programming language in this data science certification training course.

DATA SCIENCE WITH R TRAINING COURSES IN CHENNAI

  1. Introduction to Data Science with R
  2. Scientific Distributions Used in R for Data Science
  3. Machine Learning
  4. Practical Applications of Machine Learning

INTRODUCTION TO DATA SCIENCE

  • Need for Data Scientists
  • Foundation of Data Science
  • What is Business Intelligence
  • What is Data Analysis
  • What is Data Mining
  • What is Machine Learning
  • Analytics vs Data Science
  • Value Chain
  • Types of Analytics
  • Lifecycle Probability
  • Analytics Project Lifecycle

 

DATA

  • Basis of Data Categorization
  • Types of Data
  • Data Collection Types
  • Forms of Data & Sources
  • Data Quality & Changes
  • Data Quality Issues
  • Data Quality Story
  • What is Data Architecture
  • Components of Data Architecture
  • OLTP vs OLAP
  • How is Data Stored?

 

BIG DATA

  • What is Big Data?
  • 5 Vs of Big Data
  • Big Data Architecture
  • Big Data Technologies
  • Big Data Challenge
  • Big Data Requirements
  • Big Data Distributed Computing & Complexity
  • Hadoop
  • Map Reduce Framework
  • Hadoop Ecosystem

 

DATA SCIENCE DEEP DIVE

  • What Data Science is
  • Why Data Scientists are in demand
  • What is a Data Product
  • The growing need for Data Science
  • Large Scale Analysis Cost vs Storage
  • Data Science Skills
  • Data Science Use Cases
  • Data Science Project Life Cycle & Stages
  • Map Reduce Framework
  • Hadoop Ecosystem
  • Data Acuqisition
  • Where to source data
  • Techniques
  • Evaluating input data
  • Data formats
  • Data Quantity
  • Data Quality
  • Resolution Techniques
  • Data Transformation
  • File format Conversions
  • Annonymization

 

INTRO TO R PROGRAMMING

  • Introduction to R
  • Business Analytics
  • Analytics concepts
  • The importance of R in analytics
  • R Language community and eco-system
  • Usage of R in industry
  • Installing R and other packages
  • Perform basic R operations using command line
  • Usage of IDE R Studio and various GUI

 

R PROGRAMMING CONCEPTS

  • The datatypes in R and its uses
  • Built-in functions in R
  • Subsetting methods
  • Summarize data using functions
  • Use of functions like head(), tail(), for inspecting data
  • Use-cases for problem solving using R

 

DATA MANIPULATION IN R

  • Various phases of Data Cleaning
  • Functions used in Inspection
  • Data Cleaning Techniques
  • Uses of functions involved
  • Use-cases for Data Cleaning using R

 

DATA IMPORT TECHNIQUES IN R

  • Import data from spreadsheets and text files into R
  • Importing data from statistical formats
  • Packages installation for database import
  • Connecting to RDBMS from R using ODBC and basic SQL queries in R
  • Web Scraping
  • Other concepts on Data Import Techniques

 

EXPLORATORY DATA ANALYSIS (EDA) USING R

  • What is EDA?
  • Why do we need EDA?
  • Goals of EDA
  • Types of EDA
  • Implementing of EDA
  • Boxplots, cor() in R
  • EDA functions
  • Multiple packages in R for data analysis
  • Some fancy plots
  • Use-cases for EDA using R

DATA VISUALIZATION IN R

  • Story telling with Data
  • Principle tenets
  • Elements of Data Visualization
  • Infographics vs Data Visualization
  • Data Visualization & Graphical functions in R
  • Plotting Graphs
  • Customizing Graphical Parameters to improvise the plots
  • Various GUIs
  • Spatial Analysis
  • Other Visualization concepts

BIG DATA AND HADOOP INTRODUCTION

  • What is Big Data and Hadoop?
  • Challenges of Big Data
  • Traditional approach Vs Hadoop
  • Hadoop Architecture
  • Distributed Model
  • Block structure File System
  • Technologies supporting Big Data
  • Replication
  • Fault Tolerance
  • Why Hadoop?
  • Hadoop Eco-System
  • Use cases of Hadoop
  • Fundamental Design Principles of Hadoop
  • Comparison of Hadoop Vs RDBMS

UNDERSTAND HADOOP CLUSTER ARCHITECTURE

  • Hadoop Cluster & Architecture
  • 5 Daemons
  • Hands-On Exercise
  • Typical Workflow
  • Hands-On Exercise
  • Writing Files to HDFS
  • Hands-On Exercise
  • Reading Files from HDFS
  • Hands-On Exercise
  • Rack Awareness
  • Before Map Reduce

MAP REDUCE CONCEPTS

  • Map Reduce Concepts
  • What is Map Reduce?
  • Why Map Reduce?
  • Map Reduce in real world.
  • Map Reduce Flow
  • What is Mapper?
  • What is Reducer?
  • What is Shuffling?
  • Word Count Problem
  • Hands-On Exercise
  • Distributed Word Count Flow & Solution
  • Log Processing and Map Reduce
  • Hands-On Exercise

 

ADVANCED MAP REDUCE CONCEPTS

  • What is Combiner?
  • Hands-On Exercise
  • What is Partitioner?
  • Hands-On Exercise
  • What is Counter?
  • Hands-On Exercise
  • InputFormats/Output Formats
  • Hands-On Exercise
  • Map Join using MR
  • Hands-On Exercise
  • Reduce Join using MR
  • Hands-On Exercise
  • MR Distributed Cache
  • Hands-On Exercise
  • Using sequence files & images with MR
  • Hands-On Exercise
  • Planning for Cluster & Hadoop 2.0 Yarn
  • Configuration of Hadoop
  • Choosing Right Hadoop Hardware?
  • Choosing Right Hadoop Software?
  • Hadoop Log Files?

 

HADOOP 2.0 & YARN

  • Hadoop 1.0 Challenges
  • NN Scalability
  • NN SPOF & HA
  • Job Tracker Challenges
  • Hadoop 2.0 New Features
  • Hadoop 2.0 Cluster Architecture & Federation
  • Hadoop 2.0 HA
  • Yarn & Hadoop Ecosystem
  • Yarn MR Application Flow

 

PIG

  • Introduction to Pig
  • What Is Pig?
  • Pig’s Features & Pig Use Cases
  • Interacting with Pig
  • Basic Data Analysis with Pig
  • Hands-On Exercise
  • Pig Latin Syntax
  • Loading Data
  • Hands-On Exercise
  • Simple Data Types
  • Field Definitions
  • Data Output
  • Viewing the Schema
  • Hands-On Exercise
  • Filtering and Sorting Data
  • Hands-On Exercise
  • Commonly-Used Functions
  • Hands-On Exercise: Pig for ETL Processing
  • Processing Complex Data with Pig
  • Hands-On Exercise
  • Storage Formats
  • Complex/Nested Data Types
  • Hands-On Exercise
  • Grouping
  • Hands-On Exercise
  • Built-in Functions for Complex Data
  • Hands-On Exercise
  • Iterating Grouped Data
  • Hands-On Exercises
  • Multi-Dataset Operations with Pig
  • Hands-On Exercise
  • Techniques for Combining Data Sets

 

MODULE 7

  • Joining Data Sets in Pig
  • Hands-On Exercise
  • Splitting Data Sets
  • Hands-On Exercise

 

HIVE

  • Hive Fundamentals & Architecture
  • Loading and Querying Data in Hive
  • Hands-On Exercise
  • Hive Architecture and Installation
  • Comparison with Traditional Database
  • HiveQL: Data Types, Operators and Functions,
  • Hands-On Exercise
  • Hive Tables ,Managed Tables and External Tables
  • Hands-On Exercise
  • Partitions and Buckets
  • Hands-On Exercise
  • Storage Formats, Importing Data, Altering Tables, Dropping Tables
  • Hands-On Exercise
  • Querying Data, Sorting and Aggregating, Map Reduce Scripts,
  • Hands-On Exercise

 

MODULE-9

  • Joins & Sub queries, Views
  • Hands-On Exercise
  • Integration, Data manipulation with Hive
  • Hands-On Exercise
  • User Defined Functions,
  • Hands-On Exercise
  • Appending Data into existing Hive Table
  • Hands-On Exercise
  • Static partitioning vs dynamic partitioning
  • Hands-On Exercise

 

HBASE

  • CAP Theorem
  • HBase Architecture and concepts
  • Introduction to HBase
  • Client API’s and their features
  • HBase tables The ZooKeeper Service
  • Data Model, Operations

 

MODULE-11

  • Programming and Hands on Exercises

 

SQOOP

  • Introduction to Sqoop
  • MySQL Client & server
  • Connecting to relational data base using Sqoop
  • Importing data using Sqoop from Mysql
  • Exporting data using Sqoop to MySql
  • Incremental append
  • Importing data using Sqoop from Mysql to hive
  • Exporting data using Sqoop to MySql from hive
  • Importing data using Sqoop from Mysql to hbase
  • Using queries and sqoop

 

FLUME & OOZIE

  • What is Flume?
  • Why use Flume, Architecture, configurations
  • Master, collector, Agent
  • Twitter Data Sentimental Analysis project
  • Oozie
  • What is Oozie, Architecture, configurations?
  • Oozie Job Submission
  • Oozie properties
  • Hands on exercises

 

PROJECTS

  • Social Media Final Project
  • Hadoop Project
  • Objective
  • Problem Definition
  • Solution
  • Discuss data sets and specifications of the project.

 

PROJECT IN HEALTHCARE DOMAIN

  • Hadoop Project in Healthcare
  • Objective
  • Problem Definition
  • Solution
  • Discuss data sets and specifications of the project.

 

PROJECT IN FINANCE/BANKING DOMAIN

  • Hadoop Project in Banking Domain
  • Objective
  • Problem Definition
  • Solution
  • Discuss data sets and specifications of the project.

 

SPARK

 

APACHE SPARK

  • Introduction to Apache Spark
  • Why Spark
  • Batch Vs. Real Time Big Data Analytics
  • Batch Analytics – Hadoop Ecosystem Overview,
  • Real Time Analytics Options,
  • Streaming Data – Storm,
  • In Memory Data – Spark, What is Spark?,
  • Spark benefits to Professionals
  • Limitations of MR in Hadoop
  • Components of Spark
  • Spark Execution Architecture
  • Benefits of Apache Spark
  • Hadoop vs Spark

 

INTRODUCTION TO SCALA

  • Features of Scala
  • Basic Data Types of Scala
  • Val vs Var
  • Type Inference
  • REPL
  • Objects & Classes in Scala
  • Functions as Objects in Scala
  • Anonymous Functions in Scala
  • Higher Order Functions
  • Lists in Scala
  • Maps
  • Pattern Matching
  • Traits in Scala
  • Collections in Scala

 

SPARK CORE ARCHITECTURE

  • Spark & Distributed Systems
  • Spark for Scalable Systems
  • Spark Execution Context
  • What is RDD
  • RDD Deep Dive
  • RDD Dependencies
  • RDD Lineage
  • Spark Application In Depth
  • Spark Deployment
  • Parallelism in Spark
  • Caching in Spark

 

SPARK INTERNALS

  • Spark Transformations
  • Spark Actions
  • Spark Cluster
  • Spark SQL Introduction
  • Spark Data Frames
  • Spark SQL with CSV
  • Spark SQL with JSON
  • Spark SQL with Database

 

SPARK STREAMING

  • Features of Spark Streaming
  • Micro Batch
  • Dstreams
  • Transformations on Dstreams
  • Spark Streaming Use Case

 

STATISTICS + MACHINE LEARNING

STATISTICS

WHATS IS STATISTICS

  • Descriptive Statistics
  • Central Tendency Measures
  • The Story of Average
  • Dispersion Measures
  • Data Distributions
  • Central Limit Theorem
  • What is Sampling
  • Why Sampling
  • Sampling Methods
  • Inferential Statistics
  • What is Hypothesis testing
  • Confidence Level
  • Degrees of freedom
  • what is pValue
  • Chi-Square test
  • What is ANOVA
  • Correlation vs Regression
  • Uses of Correlation & Regression

 

MACHINE LEARNING

MACHINE LEARNING INTRODUCTION

  • ML Fundamentals
  • ML Common Use Cases
  • Understanding Supervised and Unsupervised Learning Techniques
  • Clustering
  • Similarity Metrics
  • Distance Measure Types: Euclidean, Cosine Measures
  • Creating predictive models
  • Understanding K-Means Clustering
  • Understanding TF-IDF, Cosine Similarity and their application to Vector Space Model
  • Case study
  • Implementing Association rule mining
  • Case study
  • Understanding Process flow of Supervised Learning Techniques
  • Decision Tree Classifier
  • How to build Decision trees
  • Case study
  • Random Forest Classifier
  • What is Random Forests
  • Features of Random Forest
  • Out of Box Error Estimate and Variable Importance
  • Case study
  • Naive Bayes Classifier.
  • Case study
  • Project Discussion
  • Problem Statement and Analysis
  • Various approaches to solve a Data Science Problem
  • Pros and Cons of different approaches and algorithms.
  • Linear Regression
  • Case study
  • Logistic Regression
  • Case study
  • Text Mining
  • Case study
  • Sentimental Analysis
  • Case study

 

PYTHON

GETTING STARTED WITH PYTHON

  • Python Overview
  • About Interpreted Languages
  • Advantages/Disadvantages of Python pydoc.
  • Starting Python
  • Interpreter PATH
  • Using the Interpreter
  • Running a Python Script
  • Python Scripts on UNIX/Windows
  • Python Editors and IDEs.
  • Using Variables
  • Keywords
  • Built-in Functions
  • StringsDifferent Literals
  • Math Operators and Expressions
  • Writing to the Screen
  • String Formatting
  • Command Line Parameters and Flow Control.

 

SEQUENCES AND FILE OPERATIONS

  • Lists
  • Tuples
  • Indexing and Slicing
  • Iterating through a Sequence
  • Functions for all Sequences
  • Using Enumerate()
  • Operators and Keywords for Sequences
  • The xrange() function
  • List Comprehensions
  • Generator Expressions
  • Dictionaries and Sets.

 

DEEP DIVE – FUNCTIONS SORTING ERRORS AND EXCEPTION HANDLING

  • Functions
  • Function Parameters
  • Global Variables
  • Variable Scope and Returning Values. Sorting
  • Alternate Keys
  • Lambda Functions
  • Sorting Collections of Collections
  • Sorting Dictionaries
  • Sorting Lists in Place
  • Errors and Exception Handling
  • Handling Multiple Exceptions
  • The Standard Exception Hierarchy
  • Using Modules
  • The Import Statement
  • Module Search Path
  • Package Installation Ways.

 

REGULAR EXPRESSIONSIT’S PACKAGES AND OBJECT ORIENTED PROGRAMMING IN PYTHON

  • The Sys Module
  • Interpreter Information
  • STDIO
  • Launching External Programs
  • PathsDirectories and Filenames
  • Walking Directory Trees
  • Math Function
  • Random Numbers
  • Dates and Times
  • Zipped Archives
  • Introduction to Python Classes
  • Defining Classes
  • Initializers
  • Instance Methods
  • Properties
  • Class Methods and DataStatic Methods
  • Private Methods and Inheritance
  • Module Aliases and Regular Expressions.

 

DEBUGGING, DATABASES AND PROJECT SKELETONS

  • Debugging
  • Dealing with Errors
  • Using Unit Tests
  • Project Skeleton
  • Required Packages
  • Creating the Skeleton
  • Project Directory
  • Final Directory Structure
  • Testing your Setup
  • Using the Skeleton
  • Creating a Database with SQLite 3
  • CRUD Operations
  • Creating a Database Object.

 

MACHINE LEARNING USING PYTHON

  • Introduction to Machine Learning
  • Areas of Implementation of Machine Learning
  • Why Python
  • Major Classes of Learning Algorithms
  • Supervised vs Unsupervised Learning
  • Learning NumPy
  • Learning Scipy
  • Basic plotting using Matplotlib
  • Machine Learning application

 

SUPERVISED AND UNSUPERVISED LEARNING

  • Classification Problem
  • Classifying with k-Nearest Neighbours (kNN)

 

ALGORITHM

  • General Approach to kNN
  • Building the Classifier from Scratch
  • Testing the Classifier
  • Measuring the Performance of the Classifier.
  • Clustering Problem
  • What is K-Means Clustering
  • Clustering with k-Means in Python and an

 

APPLICATION EXAMPLE

  • Introduction to Pandas
  • Creating Data Frames
  • GroupingSorting
  • Plotting Data
  • Creating Functions
  • Converting Different Formats
  • Combining Data from Various Formats
  • Slicing/Dicing Operations.

 

SCIKIT AND INTRODUCTION TO HADOOP

  • Introduction to Scikit-Learn
  • Inbuilt Algorithms for Use
  • What is Hadoop and why it is popular
  • Distributed Computation and Functional Programming
  • Understanding MapReduce Framework Sample MapReduce Job Run.

 

HADOOP AND PYTHON

  • PIG and HIVE Basics
  • Streaming Feature in Hadoop
  • Map Reduce Job Run using Python
  • Writing a PIG UDF in Python
  • Writing a HIVE UDF in Python
  • Pydoop and MRjob Basics.

 

PYTHON PROJECT WORK

  • Real world project
© 2018 Automation Minds. All rights reserved..