R Learning Path: From beginner to expert in R in 7 steps
This learning path is mainly for novice R users that are just getting started but it will also cover some of the latest changes in the language that might appeal to more advanced R users.
Step 3: The core of R -> packages
Every R package is simply a bundle of code that serves a specific purpose and is designed to be reusable by other developers. In addition to the primary codebase, packages often include data, documentation, and tests. As an R user, you can simply download a particular package (some are even pre-installed) and start using its functionalities. Everyone can develop R packages, and everyone can share their R packages with others.
The above is an extremely powerful concept and one of the key reasons R is so successful as a language and as a community. Namely, you don’t need to do all the hard core programming yourself or understand every complex detail of a particular algorithm or visualization. You can simple use the out-of-the box functions that come with the relevant package as an interface to such functionalities. As such it is useful to have an understanding of R’s package ecosystem.
Many R packages are available from the Comprehensive R Archive Network, and you can install them using the install.packages function. What is great about CRAN is that it associates packages with a particular task via Task Views. Alternatively, you can find R packages on bioconductor, github and bitbucket.
Looking for a particular package and corresponding documentation? Try Rdocumentation, where you can easily search packages from CRAN, github and bioconductor.
Step 4: Help?!
You will quickly find out that for every R question you solve, five new ones will pop-up. Luckily, there are many ways to get help:
- Within R you can make use of its built-in help system. For example the command `?plot` will provide you with the documentation on the plot function.
- R puts a big emphasis on documentation. The previously mentioned Rdocumentation is a great website to look at the different documentation of different packages and functions.
- Stack Overflow is a great resource for seeking answers on common R questions or to ask questions yourself.
- There are numerous blogs & posts on the web covering R such as KDnuggets and R-bloggers.
Step 5: The Data Analysis Workflow
Once you have an understanding of R’s syntax, the package ecosystem, and how to get help, it’s time to focus on how R can be useful for the most common tasks in the data analysis workflow
5.1 Importing Data
Before you can start performing analysis, you first need to get your data into R. The good thing is that you can import into R all sorts of data formats, the hard part this is that different types often need a different approach:
- Flat files: You can import flat files with functions such as read.table() and read.csv() from the pre-installed utils package. Specific R packages to import flat files data are readr and fread() function of the data.table package.
- You can get your excel files into R with either the readxl package, the gdata package and XLConnect package. (Read more on importing your excel files into R)
- The haven package lets you import SAS, STATA and SPSS data files into R. The foreign package lets you import formats like Systat and Weka.
- Connecting with a database happens via specific packages like RMySQL, RpostgreSQL and the ROracle package. Accessing and manipulating the database happens via DBI.
- For web scraping you can use a package like rvest. (For more info on web scraping with R check the blog of Rolf Fredheim.)
If you want to learn more on how to import data into R check an online Importing Data into R tutorial or this post on data importing.
5.2 Data Manipulation
Performing data manipulation with R is a broad topic as you can see in for example this Data Wrangling with R video by RStudio or the book Data Manipulation with R. This is a list of packages in R that you should master when performing data manipulations:
- The tidyr package for tidying your data.
- The stringr package for string manipulation.
- When working with data frame like objects it is best to make yourself familiar with the dplyr package (try this course). However. in case of heavy data wrangling tasks, it makes more sense to check out the blazingly fast data.table package (see this syntax cheatsheet for help).
- When working with times and dates install the lubridate package which makes it a bit easier to work with these.
- Packages like zoo, xts and quantmod offer great support for time series analysis in R.