CRISP-DM, still the top methodology for analytics, data mining, or data science projects
CRISP-DM remains the most popular methodology for analytics, data mining, and data science projects, with 43% share in latest KDnuggets Poll, but a replacement for unmaintained CRISP-DM is long overdue.
What main methodology are you using for your analytics, data mining, or data science projects ?
Compared to 2007 KDnuggets Poll on Methodology, the results are surprisingly stable.
CRISP-DM remains the top methodology for data mining projects, with essentially the same percentage as in 2007 (43% vs 42%). However, it is reported to be used by less than 50%.
CRISP-DM was conceived around 1996 - I remember attending a CRISP-DM meeting in Brussels in 1998 (don't repeat my mistake and never eat bloedworst.)
The 6 high-level phases of CRISP-DM are still a good description for the analytics process, but the details and specifics need to be updated. CRISP-DM does not seem to be maintained and adapted to the challenges of Big Data and modern data science. The original crisp-dm.org site is no longer active, and IBM SPSS Modeler is probably the only tool that still includes it.
One response to this lack of modern methodology is the significant increase in people using their own methodology and other methodologies (together 35.5%, up from 23% in 2007)
There are other methodologies being developed - see James Taylor comment below and additional links at the bottom of this post.
We also note a big decline in SAS SEMMA methodology (from 13 to 8.5%) .
Perhaps most encouragingly, in the era of Big Data when lack of methodology is likely to produce random and false discoveries, zero people reported using no methodology.
|What main methodology are you using for your analytics, data mining, or data science projects ? [200 votes total]
2014 poll 2007 poll
|CRISP-DM (86)|| 43%
|My own (55)|| 27.5%
|SEMMA (17)|| 8.5%
|Other, not domain-specific (16)|| 8%
|KDD Process (15)|| 7.5%
|My organizations' (7)|| 3.5%
|A domain-specific methodology (4)|| 2%
|None (0)|| 0%
Regional distribution of voters was
- US/Canada, 45.5%
- Europe, 28.5%
- Asia, 14%
- Latin America, 9.5%
- Other, 2.5%
Gregory Piatetsky, Editor, Business Understanding
Ralph, Business (domain) understanding is not binary - you can always have more! Part of the knowledge discovery and CRISP-DM process is to increase your business understanding
Ralph Winters, Business Understanding
I always thought of the Business Understanding part as a chicken or egg problem. You either have it and you can mine it, or you need to mine it to define it, if you don't.
James Taylor, Decision Modeling
I like CRISP-DM because it puts business understanding front and center at the beginning of the project. We have had some success with using decision modeling - based on the new Decision Model and Notation standard - as a way to express understanding of the business problem by modeling the decision that the analytic is designed to improve. More focused than simply saying "improve this metric", decision modeling helps focus analytic projects on improving the way the business acts today while providing great assets for planning deployment and adoption.
See decisionmanagementsolutions.com/decision-modeling-for-predictive- analytic-projects for more.
Martin Jetton, coriosgroup.com/resources/
Robin Way, in his model deployment red paper, outlines a very nice ongoing methodology that covers not just model deployment but model maintenance as well. Model maintenance is a very important aspect for financial institutions.
Breno C. Costa, crisp-dm update?
In past, I looked for a data mining methodology and found crisp-dm, but it was not updated for a long time. Is there any initiative to update that methodology, and where i found documentation about it (specification, book or paper)?
Additional relevant links:
- KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW
- Big Data Consulting Methodology
- Why are so many customers failing in their Big Data initiatives?
- Big Data Implementation Methodology
- The Great Methodology Debate - Why Text Analytics is Most Important!, Tom HC Anderson Google+ post
- Data Analytics Methodology
- SAS Project Methodology for Analytics and Data Mining
- 6 Steps for End-to-End Analytics, Deloitte