New Standard Methodology for Analytical Models


Traditional methods for the analytical modelling like CRISP-DM have several shortcomings. Here we describe these friction points in CRISP-DM and introduce a new approach of Standard Methodology for Analytics Models which overcomes them.



Summary of phases in tables

Below follows an overview in table format of the different phases

Table 2: artifacts per phase

Phase

Artifacts 

Use-case identification 

-One selected use-case 

-A one-slider description of the use-case, including deployment considerations

-A presentation with the artifacts created from the discussions

-Potentially a roadmap with others use-cases on a timeline

Model requirements gathering

-A model requirement document

-A presentation that documents the impacts of the requirements on the remainder of the process

Data preparation

-Data being available ready for modeling

-An understanding of the availability of scoring data

-The confidence of the data scientist that modeling is possible

Modeling experiments

-A functional analytical model

-An evaluation of the analytical model against hold-out data

-A presentation of the model and the evaluation

Insight creation

-A set of visualization and dashboards

Proof of Value: ROI

-An experimental design, including metrics and success criteria 

-The results of the experiment

-A presentation reporting on the experimental setup and outcome

Operationalization

-A hand-over document

-An operational data requirements document

-Model archiving requirements

-An architecture design –technical, data, business and functional

-An analytical model execution plan

-A audit approach

-The operational analytical model

Model lifecycle

-A governance document

Table 3: parties involved per phase

Phase

Parties involved (Data scientists are included by default)

Use-case identification 

(Higher) management, Business department, IT/Data owners

Model requirements gathering

Business department, End-users, IT

Data preparation

IT/Data owners, 

Modeling experiments

Insight creation

Business department, End-users

Proof of Value: ROI

Business department, End-users

Operationalization

Business department, End-users, IT

Model lifecycle

End-users, IT

Table 4: activities per phase

Phase

Sub phases/activities

Use-case identification 

-Analytic aspiration acknowledgment

-Analytical model education on business level

-Brainstorm session(s)

-Final candidate selection

Model requirements gathering

-Business requirements

-IT requirements

-End-user requirements

-Scoring requirements

-Data requirements

-Analytical model requirements

Data preparation

-Data access contract

-Data access

-Data exploration

-Analytical modeling data creation

-Operational scoring data format creation 

-Feature creation

Modeling experiments

-Think, Model, Evaluate, Repeat

Insight creation

-Insight requirement gathering

-Dashboard, visualization design

-Design execution

-Design evaluation

Proof of Value: ROI

-Metrics and success criteria gathering

-Experimental design

-Experimental data contract

-Experiment execution

-Experiment reporting

Operationalization

-Operationalization requirements gathering

-Integration planning

-Integration execution (may contain its own project phases)

-Hand over

-Analytical model execution go-live

Model lifecycle

-Governance requirements gathering

-Analytical model result archiving

-Analytical model refresh

-Analytical model upgrade 

Table 5: discussion topics per phase

Phase

Topics of discussion

Use-case identification 

-Business understanding

-Data availability

-Integration complexity

-Model complexity, required accuracy

-Model impact assessment

Model requirements gathering

-Business requirements

-IT requirements

-End-user requirements

-Scoring requirements

-Data requirements

-Analytical model requirements

Data preparation

-Data access

-Data understanding

-Data integration

-Data completeness/cleanliness

-Data formatting

-Scoring data availability

-Data delivery contracts

-Feature creation

Modeling experiments

-Data setup for modeling and validation

-Ways to (re) frame the business question

-Analytical model validity

-Analytical model reliability

-Analytical model stability

-Model evaluation criteria

-Model result end-user validation

-Model rationalization procedure

Insight creation

-Reporting KPI’s, metrics, dimensions

-Data delivery model, including refresh rate

-Report access

Proof of Value: ROI

-Experimental setup

-ROI computation

-Success criteria

Operationalization

-Integration requirements

-Error procedures

-Data delivery contract

-Hand-over procedure

-Execution plan

Model lifecycle

-Model governance requirements 

-Analytical model (result) archiving requirements

-Analytical model refresh cycle/procedure

-Analytical model upgrade cycle/procedure

References

  1. Shearer C., The CRISP-DM model: the new blueprint for data mining, J Data Warehousing (2000); 5:13—22.
  2. Doing Data Science: Straight Talk from the Frontline, Rachel Schutt, Cathy O'Neil. O'Reilly Media, Inc. (2013), p359.
  3. http://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
  4. http://khabaza.codimension.net/index_files/9laws.htm
  5. http://en.wikipedia.org/wiki/Data_science
  6. http://en.wikipedia.org/wiki/Agile_software_development

Original.

Bio: Olav Laudy is Chief Data Scientist, IBM Analytics, Asia-Pacific.

Related