IMISC 2019

Data Science With R

R is a powerful language used widely for data analysis and statistical computing. It was developed in early 90s. Inclusion of powerful packages in R has made it more and more powerful with time. Packages such as dplyr, tidyr, readr, ggplot2 have made data manipulation, visualization and computation much faster.

Target Audience

Software Developers
Willing to work on Statistics, data analyst and statistical learning
Undergraduate(Senior), Master and PhD students willing to work on statistics and data science domain

Tutorial Content

Introduction to R Programming Language*
Dataframe manipulation with (Tabular Dataset) dplyr package (Data wrangling)
Basic statistics with R
Data visualisation with ggplot

Bar chart
Histogram
Scatter Plot
Pairplot

Data preprocessing

Missing value handling
Categorical value transformations
Outlier detection
Attribute normalization

Correlation Matrix
Attribute selection

Near Zero Variance
Boruta (Random Forest-based Attribute Selection Method)

Supervised Learning

Regression
Classification

Creation of machine learning models with caret and e1071 packages
k-fold crossvalidation
Underfitting and Overfitting
Unsupervised Learning

Clustering
Principle Component Analysis

*Participants must be familiar with at least one programming language.