What is the most important step in a machine learning project?
In any machine learning project, business understanding is very important. But in practice, it does not get enough attention. Here we explain what questions should be asked.
By Shahar Cohen, YellowRoad.
The CRISP-DM is a common standard for machine-learning projects. Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation and Deployment. All these six steps of a machine-learning project are crucial. Quality issues in each step will directly affect the quality of the entire outcome. They are all important.
However, advising to many organizations on machine learning, and running even more such projects ourselves, we (at YellowRoad) came to a conclusion, that the most under-invested step in the process is Business Understanding. We see many companies discussing algorithms and technology, before understanding the business aspects of the task that they are solving. This is clearly not a good starting point.
We composed a series of questions that we use in any machine learning project that we get involved in, and we do not invest serious efforts in the following steps, until we have good answers to these questions. We find that practice to be extremely helpful.
Here are the questions:
- What are we trying to achieve, business wise? Why is it important?
- What are the inputs and outputs for the task that we are trying to solve?
- Given a hypothetical solution to that task, how would it affect our operations? (another way to ask this question: assuming that I have a perfect solution to your machine learning task, how will you use it?)
- Do we already have the ability to act based on such solution, or do we also need to develop that ability? (if the ability is there, learn it carefully. If not, keep close contact with the team that is responsible for developing it)
- How are we going to measure a suggested solution? (KPIs)
- What would make it a success?
- Do we have the input data available? How hard it is to extract it? Are we allowed to use it?
- Are we experience in building similar solutions? Do we understand what it takes?
- Do we have hard budget and timelines constraints?
- Who will develop the solution? Do we have the required skills in house?
Original. Reposted with permission.
Bio: Shahar Cohen is a co-founder at YellowRoad and an experienced data scientist and researcher with over 10 years of experience.
- How A Data Scientist Can Improve Productivity
- Data Version Control: iterative machine learning
- Fixing Deployment and Iteration Problems in CRISP-DM