R Learning Path: From beginner to expert in R in 7 steps

This learning path is mainly for novice R users that are just getting started but it will also cover some of the latest changes in the language that might appeal to more advanced R users.



Step 3: The core of R -> packages

 

Every R package is simply a bundle of code that serves a specific purpose and is designed to be reusable by other developers. In addition to the primary codebase, packages often include data, documentation, and tests. As an R user, you can simply download a particular package (some are even pre-installed) and start using its functionalities. Everyone can develop R packages, and everyone can share their R packages with others.

 

The above is an extremely powerful concept and one of the key reasons R is so successful as a language and as a community. Namely, you don’t need to do all the hard core programming yourself or understand every complex detail of a particular algorithm or visualization. You can simple use the out-of-the box functions that come with the relevant package as an interface to such functionalities.  As such it is useful to have an understanding of R’s package ecosystem.

 

Many R packages are available from the Comprehensive R Archive Network, and you can install them using the install.packages function. What is great about CRAN is that it associates packages with a particular task via Task Views. Alternatively, you can find R packages on bioconductor, github and bitbucket.

 

Looking for a particular package and corresponding documentation? Try Rdocumentation, where you can easily search packages from CRAN, github and bioconductor.

Step 4: Help?!

You will quickly find out that for every R question you solve, five new ones will pop-up. Luckily, there are many ways to get help:

  • Within R you can make use of its built-in help system. For example the command  `?plot` will provide you with the documentation on the plot function.
  • R puts a big emphasis on documentation. The previously mentioned Rdocumentation is a great website to look at the different documentation of different packages and functions.
  • Stack Overflow is a great resource for seeking answers on common R questions or to ask questions yourself.
  • There are numerous blogs & posts on the web covering R such as KDnuggets and R-bloggers.

Step 5: The Data Analysis Workflow

Once you have an understanding of R’s syntax, the package ecosystem, and how to get help, it’s time to focus on how R can be useful for the most common tasks in the data analysis workflow

5.1 Importing Data

 

Before you can start performing analysis, you first need to get your data into R. The good thing is that you can import into R all sorts of data formats, the hard part this is that different types often need a different approach:

 

 

If you want to learn more on how to import data into R check an online Importing Data into R tutorial or  this post on data importing.

5.2 Data Manipulation

 

Performing data manipulation with R is a broad topic as you can see in for example this Data Wrangling with R video by RStudio or the book Data Manipulation with R. This is a list of packages in R that you should master when performing data manipulations:

 

  • The tidyr package for tidying your data.
  • The stringr package for string manipulation.
  • When working with data frame like objects it is best to make yourself familiar with the dplyr package (try this course). However. in case of heavy data wrangling tasks, it makes more sense to check out the blazingly fast data.table package (see this syntax cheatsheet for help).
  • When working with times and dates install the lubridate package which makes it a bit easier to work with these.
  • Packages like zoo, xts and quantmod offer great support for time series analysis in R.