Data Science With R
R is a powerful language used widely for data analysis and statistical computing. It was developed in early 90s. Inclusion of powerful packages in R has made it more and more powerful with time. Packages such as dplyr, tidyr, readr, ggplot2 have made data manipulation, visualization and computation much faster.
Target Audience
- Software Developers
- Willing to work on Statistics, data analyst and statistical learning
- Undergraduate(Senior), Master and PhD students willing to work on statistics and data science domain
Tutorial Content
- Introduction to R Programming Language*
- Dataframe manipulation with (Tabular Dataset) dplyr package (Data wrangling)
- Basic statistics with R
- Data visualisation with ggplot
- Bar chart
- Histogram
- Scatter Plot
- Pairplot
- Data preprocessing
- Missing value handling
- Categorical value transformations
- Outlier detection
- Attribute normalization
- Correlation Matrix
- Attribute selection
- Near Zero Variance
- Boruta (Random Forest-based Attribute Selection Method)
- Supervised Learning
- Regression
- Classification
- Creation of machine learning models with caret and e1071 packages
- k-fold crossvalidation
- Underfitting and Overfitting
- Unsupervised Learning
- Clustering
- Principle Component Analysis
*Participants must be familiar with at least one programming language.