KDnuggets Home » News » 2015 » Oct » Opinions, Interviews, Reports » Data Science Programming: Python vs R ( 15:n35 )

Data Science Programming: Python vs R


With every industry generating massive amounts of data – the need to crunch data requires more powerful and sophisticated programming tools like Python and R language.



Python vs R for Data Science

Data Science in Python and R Language

Python is popular as a general purpose web programming language whereas R is popular for its great features for data visualization as it was particularly developed for statistical computing. At DeZyre, our career counsellors often get questions from prospective students as to what should they learn first Python programming or R programming. If you are unsure on which programming language to learn first then you are on the right page.

Python and R language top the list of basic tools for statistical computing among the set of data scientist skills. Data scientists often debate on the fact that which one is more valuable R programming or Python programming, however both the programming languages have their specialized key features complementing each other.

Data Science with Python Language

Data science consists of several interrelated but different activities such as computing statistics, building predictive models, accessing a

nd manipulating data, building explanatory models, data visualizations, integrating models into production systems and much more on data. Python programming provides data scientists with a set of libraries that helps them perform all these operations on data.

Python is a general purpose multi-paradigm programming language for data science that has gained wide popularity-because of its syntax simplicity and operability on different eco-systems. Python programming can help programmers play with data by allowing them to do anything they need with data – data munging, data wrangling, website scraping, web application building, data engineering and more. Python language makes it easy for programmers to write maintainable, large scale robust code.

“Python programming has been an important part of Google since the beginning, and remains so as the system grows and evolves. Today dozens of Google engineers use Python language, and we’re looking for more people with skills in this language.” – said Peter Norvig, Director at Google.

Unlike R language, Python language does not have in-built packages but it has support for libraries like Scikit, Numpy, Pandas, Scipy and Seaborn that data scientists can use to perform useful statistical and machine learningtasks. Python programming is similar to pseudo code and makes sense immediately just like English language. The expressions and characters used in the code can be mathematical, however, the logic can be easily adhered from the code.

What makes Python language the King of Data Science Programming Languages?

“In Python programming, everything is an object. It’s possible to write applications in Python language using several programming paradigms, but it does make for writing very clear and understandable object-oriented code.”- said Brian Curtin, member of Python Software Foundation

1) Broadness

The public package index for Python language popularly known as PyPi has approximately 40K add-ons available listed under 300 different categories. So, if a developer or a data scientist has to do something with Python language then there is high probability that someone already has it and they need not begin from the scratch. Python programming is used extensively for various tasks ranging from CGI and web development, system testing and automation, and ETL to gaming.

2) Efficient

Developers these days spend lot of time in defining and processing big data. With the increasing amount of data that needs to be processed, it becomes extremely important for programmers to efficiently manage the in-memory usage. Python language has generators both from functions and also as expressions which helps in iterative processing i.e. one item at a time. When there are large number of processes to be applied to a set of data in that case generators in Python language prove to be great advantage as they grab the source data ,one item at a time and then pass through the entire processing chain.

The generator based migration tool collective.transmogrifier helps make complex and interdependent updates to the data as it is being processed from the old site and then allows the programmers to create and store objects in constant memory at the new site.The transmogrifier plays vital role in Python programming when dealing with larger data sets.

3) Can be Easily Mastered Under Expert Guidance-Read It, Use it with Ease

Python language has gained wide popularity as the syntax is clear and readable making it easy to learn under expert guidance. Data scientists can gain expertise knowledge and master programming with Python in scientific computing by taking industry expert oriented Python programming courses. The readability of the syntax makes it easier for other peer programmers update already written Python programs at a faster pace and also helps write new programs quickly.

Applications of Python language-

  • Python programming is used by Mozilla for exploring their broad code base. Mozilla releases several open source packages built using Python.
  • Dropbox, a popular file hosting service founded by Drew Houston as he kept forgetting his USB. The project was started to fulfill his personal needs but it turned out to be so good that even others started using it. Dropbox is completely written in Python language and now has close to 150 million registered users.
  • Walt Disney uses Python language to enhance the supremacy of their creative processes.
  • Some other exceptional products written in Python language are –
    1. Cocos2d: A popular open source 2D gaming framework
    2. Mercurial: A popular cross-platform, distributed code revision control tool used by developers.
    3. Bit Torrent: File sharing software
    4. Reddit: Entertainment and Social News website.

Limitations of Python Programming-

  • Python is an interpreted language and thus is many a times slower than the compiled languages.
  • “A possible disadvantage of Python is its slow speed of execution. But many Python packages have been optimized over the years and execute at C speed.”- said Pierre Carbonnelle, a Python programmer who runs the PyPL language index.
  • Python language being a dynamically typed language poses certain design restrictions. It requires rigorous testing because errors show up only during runtime.
  • Python programming has gained popularity on desktop and server platforms but is still weak on mobile computing platforms as there are very less number of mobile apps that are developed using Python language. Python programming can be rarely found on the client side of web applications.

Sign Up

By subscribing you accept KDnuggets Privacy Policy