R Learning Path: From beginner to expert in R in 7 steps
This learning path is mainly for novice R users that are just getting started but it will also cover some of the latest changes in the language that might appeal to more advanced R users.
5.3 Data Visualization
One of the main reasons R is the favorite tool of data analysts and scientists is because of its data visualization capabilities. Tons of beautiful plots are created with R as shown by all the posts on FlowingData, such as this famous facebook visualization:
If you want to get started with visualizations in R, take some time to study the ggplot2 package. One of the (if not the) most famous packages in R for creating graphs and plots. ggplot2 is makes intensive use of the grammar of graphics, and as a result is very intuitive in usage (you’re continuously building part of your graphs so it’s a bit like playing with lego). There are tons of resources to get your started such as this interactive coding tutorial, a cheatsheet and an upcoming book by Hadley Wickham.
Besides ggplot2 there are multiple other packages that allow you to create highly engaging graphics and that have good learning resources to get you up to speed. Some of our favourites are:
- ggvis for interactive web graphics (see tutorial )
- googleVis to interface with google charts
- Plotly for R
If you want to see more packages for visualizations see the CRAN task view. In case you run into issues plotting your data this post might help as well.
Next to the “traditional” graphs, R is able to handle and visualize spatial data as well. You can easily visualize spatial data and models on top of static maps from sources such as Google Maps and Open Street Maps with a package such as ggmap. Another great package is choroplethr developed by Ari Lamstein of Trulia or the tmap package. Take this tutorial on Introduction to visualising spatial data in R if you want to learn more.
5.4 The stats part
In case you are new to statistics, there are some very solid sources that explain the basic concepts while making use of R:
- Andrew Conway’s Introduction to statistics with R (online interactive coding course)
- Data Analysis and Statistical Inference by Duke University (MOOC)
- Practical Data Science With R (book)
- Data Analysis for life sciences by Harvard University (MOOC)
- Data Science Specialization by Johns Hopkins (MOOC)
- A Survival Guide to Data Science with R (book)
Note that these resources are aimed at beginners. If you want to go more advanced you can look at the multiple resources there are for machine learning with R. Books such as Mastering Machine Learning with R and Machine Learning with R explain the different concepts very well, and online resources like the Kaggle Machine Learning course help you practice the different concepts. Furthermore there are some very interesting blogs to kickstart your ML knowledge like Machine Learning Mastery or this post.
5.5 Reporting your results
One of the best way to share your models, visualizations, etc is through dynamic documents. R Markdown (based on knitr and pandoc) is a great tool for reporting your data analysis in a reproducible manner though html, word, pdf, ioslides, etc. This 4 hour tutorial on Reporting with R Markdown explains the basics of R markdown. Once you are creating your own markdown documents, make sure this cheat sheet is on your desk.
Step 6: Become an R wizard and discovering exciting new stuff
R is a fast-evolving language. It’s adoption in academics and business is skyrocketing, and consequently the rate of new features and tools within R is rapidly increasing. These are some of the new technologies and packages that excite us the most:
- HTML widgets allow you to create interactive web visualizations such as dynamic maps (leaflet), time-series data charting (dygraphs), and interactive tables (DataTables). If you want to learn how to create your own watch this tutorial by RStudio.
- Another technology making a lot of noise recently is Shiny. With Shiny you can make your own interactive web applications in R such as these. There is a whole learning portal dedicated to building your own Shiny applications.
- Lately, there is a lot of focus on how to run R in the cloud. If you want to do this yourself, you can have a look at tutorials such as running R on AWS, the R programming language for Azure, and RStudio Server on Digital Ocean.
Once you have some experience with R, a great way to level up your R skillset is the free book Advanced R by Hadley Wickham. In addition, you can start practicing your R skills by competing with fellow Data Science Enthusiasts on Kaggle, an online platform for data-mining and predictive modelling competitions. Here you have the opportunity to work on fun cases such as this titanic data set.
To end, you are now probably ready to start contributing to R yourself by writing your own packages. Enjoy!