Data Science With R

R is a powerful language used widely for data analysis and statistical computing. It was developed in early 90s. Inclusion of powerful packages in R has made it more and more powerful with time. Packages such as dplyr, tidyr, readr, ggplot2 have made data manipulation, visualization and computation much faster.

Target Audience

  • Software Developers
  • Willing to work on Statistics, data analyst and statistical learning
  • Undergraduate(Senior), Master and PhD students willing to work on statistics and data science domain

Tutorial Content

  • Introduction to R Programming Language*
  • Dataframe manipulation with (Tabular Dataset) dplyr package (Data wrangling)
  • Basic statistics with R
  • Data visualisation with ggplot
    • Bar chart
    • Histogram
    • Scatter Plot
    • Pairplot
  • Data preprocessing
    • Missing value handling
    • Categorical value transformations
    • Outlier detection
    • Attribute normalization
  • Correlation Matrix
  • Attribute selection
    • Near Zero Variance
    • Boruta (Random Forest-based Attribute Selection Method)
  • Supervised Learning
    • Regression
    • Classification
  • Creation of machine learning models with caret and e1071 packages
  • k-fold crossvalidation
  • Underfitting and Overfitting
  • Unsupervised Learning
    • Clustering
    • Principle Component Analysis

*Participants must be familiar with at least one programming language.