The Importance of Experiment Design in Data Science

Do you feel overwhelmed by the sheer number of ideas that you could try while building a machine learning pipeline? You can not take the liberty of trying all possible ways to arrive at a solution - hence we discuss the importance of experiment design in data science projects.




We all are participants in experiments one way or the other. Either some ad-targeting agency is conducting an experiment to check what types of ads are to be shown to the user to get the sales aka conversion. Or, it could be some feature change on the website of some of the popular machine learning course providers to assess which change users are most receptive to and whether that change nudges the business KPI the experiment organizer wants to observe. This randomized experimentation is called AB testing which is broadly categorized under the realm of hypothesis testing. 


The Importance of Experiment Design in Data Science
Source: Group vector created by freepik


If you are with me so far, then welcome to the world of experiments. Let's start first with understanding what is an experiment.

Generally speaking, an experiment is defined as:


“a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried.”


Building upon the general definition of an experiment, its scientific meaning involves hypothesis testing to check whether a proposed solution works for a given problem statement. One key thing to note is that the experiments are performed in a controlled manner.

In this post, we will learn the significance of experiment design in the context of data science projects. So, let's go through one more definition of experiment design


“Experimental design is a concept used to organize, conduct, and interpret the results of experiments in an efficient way, making sure that as much useful information as possible is obtained by performing a small number of trials”


There are multiple ways a data scientist can design and conduct experiments in a machine learning project. But which ones to try first, how should the team plan and conduct multiple experiments concurrently and eventually tie their analysis into meaningful insights and outcomes? It takes a skilled data scientist to not get overwhelmed by the swarm of potentially bright and excellent ideas. They outrightly rule out specific ideas and experiments simply because they know which algorithm and method work with what dataset and what are the shortcomings of the chosen algorithm. Such skill is not developed overnight and demands a number of years of experience to rank-order the experiments in order to yield a greater return on time and resources. 


An Example


Quite often, the data scientists often jump to assume what type of machine learning framework would be the best fit for the problem at hand. Understanding business context is at the core of machine learning projects. How to map a business problem to the statistical machine learning problem is crucial to the success of the business outcome and impact. Let's understand with an example how a typical machine learning experiment:

  • Based on such inputs, data scientists need to narrow down and decide which algorithm to use. For example, if it is a classification problem, whether to use logistic regression or random forest classifier constitutes one of the experiments.


Factors to consider while designing an experiment:


Ideas are free, they cost nothing. But which ideas to take forward and design an experiment requires various considerations.

  • Hypothesis - The intuitive understanding of how this experiment will solve the given problem
  • Data Available - Do you have the data to start with? 
  • Data Required - Having a lot of data does not ensure the success of the project and requires a careful evaluation of what all attributes are required to solve the business problem. An initial exploratory and feasibility workshop with the business leaders helps bring this requirement into perspective.
  • Level of Effort (LOE) - What is the effort estimate to conduct it?
  • Do it yourself (DIY) or Open Source - Is there an already existing tool, package, library, or code base that can be quickly leveraged to conclude the hypothesis?
  • Independent or not - Is the experiment dependent on some precursor result or is decoupled? Speed to execute an experiment impedes amid dependencies hurling from multiple teams or due to lack of infrastructure 
  • Success criteria - How to conclude the experiment yield expected returns?
  • Integration Testing - Does your successful experiment work under a certain constraint and is not reliable once the environment changes (which is inevitable)? Is it statistically significant? How confident are you that results are reproducible? Does the final outcome integrate well with the rest of the machine learning ecosystem?

Experiment design is succinctly explained as the identification of a set of factors, which can potentially drive process performance, the selection of reasonable levels for each of these factors, the definition of a set of combinations of factor levels, and the execution of experiments according to the defined experimental design.


Pro Tip 1 


An experienced data scientist is able to leverage his knowledge bank learned from previous projects and can prudently choose the selected experiments to generate business value instead of going in all directions. Having said that, it is always a good practice to engage in healthy technical discussions with the team and pick their brains, decide on the pros and cons of each experiment, under what assumptions would this experiment works vs fail, and log them in a tracker. Such a discussion will help you sort aka rank order your experiments with respect to their potential impact and outcome. The premise is derived from the ensembling methods in machine learning that a single data scientist might not be able to think through all the corner cases unless being asked by the second pair of eyes (well, as many qualified pairs of eyes as possible :))


Pro Tip 2


Quite often the experiment at the onset is known to be more research-oriented and the data scientist is aware that even if this experiment gives the best performance, it can not be taken to production. You must be thinking then why do we try such an experiment in the first place? Well, it is important to establish the best case scenario aka north star, even if it is just theoretical. That gives an estimate as to how far the current production-ready model versions are and what type of trade-off is needed to get to the best-known performance.


Pro Tip 3 


Conducting an experiment is one thing, analysing it accurately is another. You may just need to run multiple loops over different algorithms or through different sample sets to decide the final one. But how you analyze the output is the key. The final chosen experiment is not just driven by one single evaluation metric. It is also a function of how scalable the solution is with respect to the infrastructure requirements and how interpretable the results are.


Experiment Management


So far, we have discussed what does an experiment design look like? If you are interested in learning how to manage multiple experiments and artifacts, refer to this excellent post. It captures the bundle of variables in an AI/ML project including but not limited to the following:

  • Pre-processing, model training, and post-processing modules 
  • Data and Model versioning: Which data was used to train the previous model or the production model? 
  • Sampling method: How was the training data created and sampled - was it imbalanced? How was it handled?
  • Model Evaluation: How was the model validated, and which data was used for it? Is it a representation of the data model that will be served within the production system?
  • Algorithm: How do you know which algorithm was used in which model version? Let's also understand that even though the algorithm might be the same in the new model version but the architecture would have changed.




In this post, we have discussed the importance of experiments, specifically in data science projects. Further, we talked about the various factors to consider before designing and conducting a machine learning experiment. The post concludes with an emphasis on what are the multiple entities and artifacts that need to be managed in an experimental design.

Vidhi Chugh is an award-winning AI/ML innovation leader and an AI Ethicist. She works at the intersection of data science, product, and research to deliver business value and insights. She is an advocate for data-centric science and a leading expert in data governance with a vision to build trustworthy AI solutions.