R or Python? Consider learning both
The key to become a data science professional is in understanding the underlying data science concepts and work towards expanding your programming toolbox as much as you can. Hence, one should understand when to use Python and when to pick R, rather mastering just one language.
Learn Data Science, not Programming
With the outbreak of the Data Science revolution a “war” between R enthusiasts and Python fanatics emerged. As a result Python and R have been compared and contrasted a thousand times with detailed listings of their respective advantages and weaknesses (e.g. see our infographic for a refresher).
All this “warfare” led to the misconception that as a data science learner & enthusiast you should relentlessly focus on mastering either R or Python. This is bad advice. Namely, the actual key to become a data science professional is in understanding the underlying data science concepts and work towards expanding your programming toolbox as much as you can. In other words you should aim to learn the fundamentals of both R (see Introduction to R) and Python (see Introduction to Python for Data Science), one after the other.
So while it is certainly true that it is important to know the differences between R and Python, today it is more relevant to understand how you can leverage the knowledge of both based on your understanding of fundamental data science concepts. In this post we hope to explain to you why you should learn both, and give you some ideas about how to begin.
R vs Python, different brushes
Why are you choosing between R and Python in the first place?
Most likely you are in need of a tool that will allow you to perform data analysis, do statistical computations, and in general be a data science practitioner. So knowing R or Python is just one component of a bigger whole, which is comprised of knowledge from disciplines such as statistics, computer science, engineering, mathematics, and even graphics design. There is a reason why most data science curricular begin with a computing tool, but never end with them.
You should think of R and Python as two different brushes that will allow you to better express yourself in data science projects, and take advantage of their individual unique features. Surely the brushes have different grip and texture, but they are also very similar and will allow you to do so much more.
Do not choose between R & Python, learn both
In general, you shouldn’t be choosing between R and Python, but instead should be working towards having both in your toolbox. Investing your time into acquiring working knowledge of the two languages is worthwhile and practical for multiple reasons.
It strengthens your data science communication skills
Both R and Python have strong online communities such as R-bloggers and python.org dedicated to the respective languages. Looking at these sites you can get the impression that R and Python communities are completely disjoint. Unnecessary to state that is not the case.
In the real world of data science, Python and R users intersect a lot. So whichever industry or discipline you are interested in you are likely to run into projects done in both languages. To appreciate it all you need to have at least a basic understanding of both R and Python. Furthermore, by mastering both, you have the advantage and versatility of presenting and communicating effectively regardless of whether your audience is more comfortable with R or Python. So if you strive to become a data scientist, you will eventually need to be fairly familiar with both languages, and most likely a whole lot more.
It boosts your data science career
Knowing both R and Python will open doors for more job opportunities. Some companies, or departments within companies might prefer Python, while other like to work with R. Imagine that you are a perfect fit for the job, except that you know R while the company requires you to know Python. Wouldn’t that suck? Generally professionals from the industry encourage entrants to acquire as many tools and skills as they can. Most of the time you won’t be expected to be a complete master of R or Python, but displaying your commitment and passion by having learned at least some of both will only give you bonus points.
It is not that hard
You can think of Python and R as Spanish and Italian; they are both very different and very similar at the same time. They have a different syntax and have their own (technical) advantages, but at the same time they become very similar when appropriate Python packages are used (numpy, pandas, …). For example:
Suppose you want to load csv files. In R you have a couple of options, one of which is read_csv(…). In Python you can use a function from the Pandas library with the code pd.read_csv(…). Spot the difference!
Also, both Python and R are what is considered «scripting languages» which allows you to write snippets of executable code without having to use a compiler like when using Java for example. Next, they both have libraries and packages that you load into your environment to add functionality and do the tasks you need to complete. In addition, when working with both you will experience that your workflow for both languages is very similar, as are the documentations and communities surrounding them.
Where the R and Python Worlds Cross
In the past, one could argue that although R and Python are two very useful tools you could learn, it’s not true that one can paint on the same canvases with them. Today, thanks to new tools and technologies, that argument is becoming more and more invalid.
We more and more see that the R and Python universes are starting to overlap, thereby mitigating the need to choose between the two languages. Lets look at some examples of technologies and tools that allow to leverage the knowledge of both languages and thus intersecting the borders between the R and Python worlds.
Let’s begin with the Jupyter project. The Jupyter Notebook is essentially a tool that allows you to write and share executable code in a variety of programming languages. The name «Ju-Pyt-er» is derived from Julia, Python, and R which immediately tells you that these three languages are the focus, though today these online notebooks support something like 40 different languages.
When working on a project in Jupyter, you can document both Python and R in the same format and share these notebooks with your colleagues, clients, students, or whoever. Jupyter is not an IDE and doesn’t attempt to replace Rstudio or Rodeo for Python. What Jupyter does is it gives you a universal space where you can display your work in either language, and hence organize your work more efficiently when implementing both R and Python for a project.
If you are interested about how to use Jupyter with R read these posts from Continuum Analytics and Revolution Analytics to get started, or see an example of what you can do with them. There is also a nice guide from quant-econ.net that you might find useful.