Python vs R for Artificial Intelligence, Machine Learning, and Data Science

This is a summary (with links) of a three-part article series that's intended to be an in-depth overview of the considerations, tradeoffs, and recommendations associated with selecting between Python and R for programmatic data science tasks.

With the explosion in popularity of artificial intelligence, machine learning, data science, and advanced analytics, people interested in learning more about these fields have many important questions to consider. This includes choosing which programming language is a better solution, or deciding which programming languages, packages, frameworks, platforms, and APIs to use for a given task or scenario.

R vs Python

In addition, there is a significant difference between performing data science and advanced analytics-related tasks on a local development machine versus developing and deploying real-world production solutions that are able to meet the necessary production demands and loads.

This is a summary (with links) of a three-part article series that's intended to be an in-depth overview of the considerations, tradeoffs, and recommendations associated with everything noted above.

The first article of the series is focused on the characteristics and paradigms associated with Python, R, and other data science-related programming languages. This includes a discussion of the many key concepts associated with programming languages and other aspects of computer science, with an emphasis of those applicable to data science and advanced analytics. There is also a discussion of commonly used integrated development environments (IDE) for both Python and R. The article ends with a brief overview of which language to use, when, and why.

The second article covers the many key concepts, considerations, tradeoffs, and challenges associated with data science and advanced analytics in production, and as it differs from local development. It includes topics such as local vs remote development and execution, single-server vs distributed computing, batch vs real-time processing, and finally offline vs online vs automated learning.

The third and final article covers which programming languages, software packages (aka libraries), frameworks, and/or platforms to use in the context of different use cases (aka scenarios) or tasks, and also includes recommendations where applicable.

After reading the three posts in the series, you will have been thoroughly exposed to most key concepts and characteristics of data science and advanced analytics languages and technologies, including the differences, considerations, tradeoffs, and challenges of putting solutions into production.

A very interesting and final thing to note: recently KDnuggets published an article with results from a survey they conduct annually on the leading languages and tools used for analytics, data science, and machine learning. The results amazingly show that the python ecosystem overtook R as the leading platform in 2017. You can read more about the results and breakdown of the data here.

Cheers, and I hope you enjoy your data science and advanced analytics journey!

Alex Castrounis is the founder and CEO of Why of AI and the author of AI for People and Business. He is also an adjunct for Northwestern University’s Kellogg / McCormick MBAi program.