List of 20 Very Useful R Packages for Data Scientist

R has been one of the fastest growing programming languages of the last decade. It is no doubt one of the top programming choices for data scientists. Across all major surveys, R consistently ranks among top ten languages.

R offers multiple packages for performing data analysis. Apart from providing an awesome interface for statistical analysis, the next best thing about R is the endless support it gets from developers and data science maestros from all over the world.

Beyond some of the popular packages such as caret, ggplot, dplyr, lattice, there exist many more libraries which remain unnoticeable, but prove to be very handy at certain stages of analysis. So, we created a comprehensive list of all packages in R.

Currently, the CRAN Package repository features 11349 available packages. But when coming to the people who are just starting to learn data science, there’s a frustration that comes up:

What are the R packages I should learn?

Here is a list for you, that I have used and found to be very very useful and powerful. Among these packages some are ofently used by Kagglers. Few of these R packages played a key role in getting a top 10 ranking in Kaggle competitions.

  • sqldf [We use it for selecting from data frames using SQL]
  • data.table [This is very famous for extension of data.frame]
  • foreach [This is useful for them who wants to use Foreach looping construct for R]
  • Matrix [This package is mainly useful for working with Sparse and Dense Matrix Classes and Methods]
  • forecast [For easy forecasting of time series)
  • plyr [It is the best tools for Splitting, Applying and Combining Data]
  • stringr [This package is really helpful for string manipulation]
  • Database connection packages RPostgreSQLRMongoRODBCRSQLite
  • lubridate [Data Scientist mainly use them for easy time and date manipulation]
  • ggplot2 [This is one of the famous and strong packages for data visualization and exploratory data analysis]
  • qcc [It is mainly used for statistical quality control and QC charts]
  • reshape2 [You can use this package for data restructuring very easily]
  • randomForest (This a very well known package in data science community for building random forest predictive models)
  • gbm [This package provides Gradient Boosting Machine]
  • e1071 [It is one of the best package I have used ever. Mainly used for building Support Vector Machines]
  • caret [caret is mainly useful to Classification and Regression Training]
  • glmnet [This provides Lasso and Elastic-Net Regularized Generalized Linear Models]
  • tau [This is very good for Text Analysis Utilities]
  • SOAR [If you want Memory management in R by delayed assignments then this the package you are looking for]
  • doMC [This is for Foreach parallel adaptor for the multicore package]