KDnuggets Home » Polls » Analytics work done in R (Mar 2011)

Analytics work done in R


What part of your analytics / data mining work in the past 12 months was done in R? [360 votes total]
None  36.4% (all)
 40.7% (US/Canada)
 30.2% (Europe)
1-10%  12.5% (all)
 12.2% (US/Canada)
 14.3% (Europe)
11-25%  7.8%
 8.1% (US/Canada)
 10.3% (Europe)
26-50%  9.7%
 11% (US/Canada)
 8.7% (Europe)
51-75%  13.1%
 13.4% (US/Canada)
 13.5% (Europe)
76-100%  20.6%
 14.5% (US)
 23.0% (Europe)


Gregory PS: bi-modal distribution
The poll results suggest a bi-modal distribution, with 36% not using R, and 34% using R for more than half of their analytics work. Most people use R in conjunction with other tools, and below is a great wealth of links about them.

Regional analysis shows that European data miners are more likely to be using R for a majority of their work. Here is the distribution of respondents by region:

  • US/Canada: 48%
  • Europe: 35%
  • Asia: 7%
  • Latin America: 4.5%
  • Africa, Middle East, AU/New Zealand: 5.5%

Tim Goh, R user interfaces
It would also be interesting to see, which user interfaces people use for their work with R:

More (?) widely used
core R interface, scripts & command line, Rattle, R Commander, Red-R (based on Orange and R), RevolutionAnalytics, RKWard, RStudio, RapidMiner (with R extension), Eclipse with Stat-ET, IBM SPSS Modeler (with R interface), JGR (Java GUI for R, pronounced "jaguar"), KNIME (with R extension), PMG (Poor man's GUI), Quick-R, SAS (with R interface), SciViews-R

Less (?) widely used
R AnalyticFlow , R Excel (R-interface for Excel) , RGNumeric , R-GUI or RGUI , RNetWeb , RServe , RSessionDA , RSoap , RStats , RStatServer , R Web , R Zope , Bio-R , Brodgar , Deducer , ESS , GrapheR , JEdit , Oracle Data Miner (ODM) (with R-ODM interface) , Tinn-R , others (please specify)

Such a poll would give a valueable orientation in the mass of solutions available and it would give a nice market overview... ;^)

Beate Wipper, R and R development environments
Rather than R directly, we use it through data mining process development environments like RapidMiner, RKWard and RStudio to ease the data mining process design and deployment.

Has anyone experience with RevolutionAnalytics?
How well designed is their GUI?
How responsive is their support?
Are there any legal issues with their bundling of the open source software R under GPL 2 license with their proprietary extensions?

Their "SAS to R Challenge" sounds compelling. :^

Peter Fjodor, R within RapidMiner and RStudio
We mainly use R for the modeling part of the data mining process. For the overall process we prefer RapidMiner, which nicely integrates RapidMiner, because its easy-to-use GUI speeds up the overall data mining process. Some of my colleagues currently take a look at RStudio and RevolutionAnalytics in order to evaluate, how those integrate with R and RapidMiner. We may consider using them as an additional solutions in the future. We are currently helping some of our customers to migrate from proprietary vendors to open source solutions like R and RapidMiner.

Trevor Kemmer, R within RapidMiner data mining processes
We use R occasionally for specific tasks, where powerful R libraries are readily available or where they can be easily adapted to our needs and where the required functionality is not already available in RapidMiner or RapidAnalytics, our primarily used data mining solutions.

We typically use RapidMiner or RapidAnalytics for the overall data mining process from ETL-tasks like data imports and transformations and interactive data analysis and visualisations over automated modelling, parameter optimization, attribute construction and selection optimization, automated modeling technique selection, pre-processing and data mining process structure optimization and evaluation to reporting and deployment. R is then used for those parts in the modeling process, where specific R libraries seem to be a better fit for the task at hand. This happens in ca. 10-20% of our applications in the finance sector (banking, investement, insurance), where quite advanced R libraries are available, and in about 5-10% of our applications in the pharma sector. We found the intuitive RapidMiner GUI and the flexibile and arbitrarily nestable process design features to best support the overall data mining design process. RapidMiner and RapidAnalytics also ease the automated process optimization and the later deployment: RapidMiner for integration as data mining and text mining engine into our own solutions and RapidAnalytics for the integration of RapidMiner processes via web services.

We try to combine the best from both worlds: The power and flexibility of R with the power and flexibility and ease of use of RapidMiner. This allows us to achieve better results in shorter time than if we used only one of the three solutions. RapidMiner and the RapidMiner R extension are available for free download at http://www.RapidMiner.com/

There are also free video tutorials demonstrating the integration of R into RapidMiner and the interaction between these solutions as well as how to use these solutions:

For R we use the standard distribution from http://www.r-project.org/ and specialized libraries e.g. for quant finance.

R, RapidMiner, and RapidAnalytics are quite powerful by themselves but even more powerful when used in combination. I recommed to take a look and to give it a try. All three solutions are freely available as open source solutions and hence free of license fees and customizable to your needs.

Some more tutorials for the combination of these tools:

Michael Berry
Gregory, I know there must be a clever joke to be made about bi-polar versus bi-modal, but I can't think of it. Is someone else feeling more creative?
Tuesday, March 29, 2011

Gregory Piatetsky
Michael - thanks for the correction, although I can see data mineRs being moRe bi-polaR than bi-modal

KDnuggets Home » Polls » Analytics work done in R (Mar 2011)