R or Python? Consider learning both
The key to become a data science professional is in understanding the underlying data science concepts and work towards expanding your programming toolbox as much as you can. Hence, one should understand when to use Python and when to pick R, rather mastering just one language.
rpy2 and rPython
The python package rpy2 allows one to essentially call R from within Python. So when you are working in Python, but there is an R package that you like for certain types of analysis, you can simply use rpy2 to bridge R and Python. If you are interested in the details of how this is done you can check out a quick guide here. For R there is a similar package: rPython. rPython allows you to run Python code, make function calls, assign and retrieve variables, etc. from R.
It should be noted though that such strategies can harm the readability of code making it more difficult to communicate your code to others. Nonetheless if you are able to annotate and document you work well, both rpy2 and rPython can bridge the R and Python universes in your work environment.
Microsoft Azur
Broadly Speaking Microsoft Azur is a collection of cloud based services revolving around data management tasks. One very interesting service related to our topic is the Microsoft Azure Machine Learning Studio. Microsoft Azure ML essentially makes it easier to manage machine learning projects with a graphical interface, and presided algorithms. However the ML Studio also support customer code. That code can be written in either R or Python and easily integrated into a project. So ML Studio is a space where both R and Python code can be implemented for a project, thereby mitigating the limitation of using only one or the other.
Dataiku
Dataiku Data Science Studio (DSS) is a product developed by Dataiku that simplifies a lot of data related operations for companies trying to leverage on their data. DSS has a projects feature that allows you to put Python and R code into one data analysis project. This feature creates an environment where someone who knows how to take advantage of both Python and R can implement them in one workflow to produce output that supports business activities, or serves clients. A nice example to look at is Spacial Data Analytics where packages from both R and Python can be integrated into the analysis to generate maps, and analyze geographical data.
Where To Begin
In the previous sections we discussed why it’s relevant to learn both R and Python, and how new tools and technologies make it more easy than ever to integrate both languages. To end, we want to address the question of where to begin.
Do you start with R or Python? If you have been working with R for a while, keep focusing on R but make sure to also get some Python skills in your toolbox. Similarly if Python has been your tool of choice thus far, carry on with Python but learn the basics of R as well. However if you are a complete novice, and even if you know a bit of one of the languages, you need to begin by asking yourself a couple of important questions.
What is your (academic) background?
Generally speaking if your background is in something other than CS or Engineering, R is more appropriate for you. So if you are interested in data science and your background is in social science like psychology and political science, or natural sciences like geography or biology, R is likely to be the way for you. However if you are a CS person, or an engineer who works a lot with computers you might prefer Python because it is a general purpose language. This by no means a universal guideline, but if you are well versed in programming you might begin thinking of R as a limited programming language. The truth is, R was designed more like a tool with programming capabilities, rather than a programming language. Regarding R as an open source alternative to STATA or SAS would be more appropriate here.
What are your needs?
if you mostly find yourself in an academic setting and you are in need of a tool for data analysis R is the way to go. However, as a professional Python would be a more likely contender for you simply because Python is more widely applied in the industry, though R is beginning to gain traction as well. Keep in mind that Python is a general purpose language that is also widely used in CS, Engineering, and other disciplines, often as a compliment or alternative to other programming languages or commercial software like Matlab.
The best way to go about answering this question for yourself is to look around you and see what people are using. For example if you are in some data science related course ask your Professor or instructor about what they prefer for personal use, or what they think is more often used in their department or discipline.
What is your life plan?
Perhaps the most vital question is: where do you hope to end up in the next 5 or 10 years? We already established that Data Science is interdisciplinary, but Data Science also lives between worlds of academia and business.
You might have already guessed that if you aspire to be an academic beginning with R might be wiser, and if you want to work in the industry you should begin with Python. By the time you get there, you will probably already know both anyway because of the intersection of R and Python users that we mentioned earlier.
If you plan to be in the academia you will most likely won’t have to develop front-end programs, manage databases, or things of that sort. You will be doing research in which case the main factor will be your preference for a particular language. Still, given that when it comes to Data Science it is becoming a standard for Statisticians, Computer Scientists, Engineers, Health researchers and others to collaborate. So your personal preferences might not be as big of a factor.
In the business world data science teams often use R internally. However when it comes to dealing with Big Data and data product development the use of other technologies and programming languages such as Python is inevitable.
Taking the first step with DataCamp
DataCamp.com is your best resource where you can learn the fundamentals of both Python and R for free, all within your browser. The courses consist of insightful tutorial videos coupled with interactive exercises so you will be learning by doing. Such approach to learning is the most effective way to take your first steps, so go ahead a begin your path towards becoming a data science!
Related: