New Standard Methodology for Analytical Models
Traditional methods for the analytical modelling like CRISP-DM have several shortcomings. Here we describe these friction points in CRISP-DM and introduce a new approach of Standard Methodology for Analytics Models which overcomes them.
Summary of phases in tables
Below follows an overview in table format of the different phases
Table 2: artifacts per phase
Phase |
Artifacts |
Use-case identification |
-One selected use-case -A one-slider description of the use-case, including deployment considerations -A presentation with the artifacts created from the discussions -Potentially a roadmap with others use-cases on a timeline |
Model requirements gathering |
-A model requirement document -A presentation that documents the impacts of the requirements on the remainder of the process |
Data preparation |
-Data being available ready for modeling -An understanding of the availability of scoring data -The confidence of the data scientist that modeling is possible |
Modeling experiments |
-A functional analytical model -An evaluation of the analytical model against hold-out data -A presentation of the model and the evaluation |
Insight creation |
-A set of visualization and dashboards |
Proof of Value: ROI |
-An experimental design, including metrics and success criteria -The results of the experiment -A presentation reporting on the experimental setup and outcome |
Operationalization |
-A hand-over document -An operational data requirements document -Model archiving requirements -An architecture design –technical, data, business and functional -An analytical model execution plan -A audit approach -The operational analytical model |
Model lifecycle |
-A governance document |
Table 3: parties involved per phase
Phase |
Parties involved (Data scientists are included by default) |
Use-case identification |
(Higher) management, Business department, IT/Data owners |
Model requirements gathering |
Business department, End-users, IT |
Data preparation |
IT/Data owners, |
Modeling experiments |
|
Insight creation |
Business department, End-users |
Proof of Value: ROI |
Business department, End-users |
Operationalization |
Business department, End-users, IT |
Model lifecycle |
End-users, IT |
Table 4: activities per phase
Phase |
Sub phases/activities |
Use-case identification |
-Analytic aspiration acknowledgment -Analytical model education on business level -Brainstorm session(s) -Final candidate selection |
Model requirements gathering |
-Business requirements -IT requirements -End-user requirements -Scoring requirements -Data requirements -Analytical model requirements |
Data preparation |
-Data access contract -Data access -Data exploration -Analytical modeling data creation -Operational scoring data format creation -Feature creation |
Modeling experiments |
-Think, Model, Evaluate, Repeat |
Insight creation |
-Insight requirement gathering -Dashboard, visualization design -Design execution -Design evaluation |
Proof of Value: ROI |
-Metrics and success criteria gathering -Experimental design -Experimental data contract -Experiment execution -Experiment reporting |
Operationalization |
-Operationalization requirements gathering -Integration planning -Integration execution (may contain its own project phases) -Hand over -Analytical model execution go-live |
Model lifecycle |
-Governance requirements gathering -Analytical model result archiving -Analytical model refresh -Analytical model upgrade |
Table 5: discussion topics per phase
Phase |
Topics of discussion |
Use-case identification |
-Business understanding -Data availability -Integration complexity -Model complexity, required accuracy -Model impact assessment |
Model requirements gathering |
-Business requirements -IT requirements -End-user requirements -Scoring requirements -Data requirements -Analytical model requirements |
Data preparation |
-Data access -Data understanding -Data integration -Data completeness/cleanliness -Data formatting -Scoring data availability -Data delivery contracts -Feature creation |
Modeling experiments |
-Data setup for modeling and validation -Ways to (re) frame the business question -Analytical model validity -Analytical model reliability -Analytical model stability -Model evaluation criteria -Model result end-user validation -Model rationalization procedure |
Insight creation |
-Reporting KPI’s, metrics, dimensions -Data delivery model, including refresh rate -Report access |
Proof of Value: ROI |
-Experimental setup -ROI computation -Success criteria |
Operationalization |
-Integration requirements -Error procedures -Data delivery contract -Hand-over procedure -Execution plan |
Model lifecycle |
-Model governance requirements -Analytical model (result) archiving requirements -Analytical model refresh cycle/procedure -Analytical model upgrade cycle/procedure |
References
- Shearer C., The CRISP-DM model: the new blueprint for data mining, J Data Warehousing (2000); 5:13—22.
- Doing Data Science: Straight Talk from the Frontline, Rachel Schutt, Cathy O'Neil. O'Reilly Media, Inc. (2013), p359.
- http://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
- http://khabaza.codimension.net/index_files/9laws.htm
- http://en.wikipedia.org/wiki/Data_science
- http://en.wikipedia.org/wiki/Agile_software_development
Bio: Olav Laudy is Chief Data Scientist, IBM Analytics, Asia-Pacific.
Related- Automatic Statistician and the Profoundly Desired Automation for Data Science
- CRISP-DM, still the top methodology for analytics, data mining, or data science projects
- Nine Laws of Data Mining, part 1