Can Data Governance Address AI Fatigue?

This post explains how data governance can help data scientists handle AI fatigue and build robust models.

By Vidhi Chugh, KDnuggets AI Strategy Content Specialist on January 9, 2024 in Artificial Intelligence

Image by Author

Data Governance and AI fatigue sound like two different concepts, but there is an intrinsic connection between the two. To understand it better, let’s start with their definition.

Data Governance

It has been the core focus of the data industry for a long time.

Google puts it well – “Data governance is everything you do to ensure data is secure, private, accurate, available, and usable. It involves setting internal standards—data policies—that apply to how data is gathered, stored, processed, and disposed of.”

As this definition highlights, data governance is about managing data – precisely the engine driving AI models.

Now that the first signs of the link between data governance and AI have started to emerge, let’s relate it to AI fatigue. Though the name gives it away, highlighting the reasons leading to such fatigue ensures consistent use of this term throughout the post.

AI Fatigue

AI fatigue sets in due to the setbacks and challenges organizations, developers, or teams face, often leading to unsuccessful value realization or implementation of AI systems.

It mostly starts with unrealistic expectations of what AI is capable of. For sophisticated technologies such as AI, key stakeholders need to align with not just the capabilities and possibilities of AI but also its limitations and risks.

Talking about risks, ethics is often considered an afterthought that leads to scrapping non-compliant AI initiatives.

You must be wondering about the role of data governance in causing AI fatigue – the premise of this post.

That’s where we are heading next.

AI fatigue can broadly be categorized as pre-deployment and post-deployment. Let us first focus on pre-deployment first.

Pre-Deployment

Various factors contribute to graduating a Proof of Concept (PoC) to deployment, such as:

What are we trying to solve?
Why does it make a compelling problem to prioritize now?
What data is available?
Is it ML-solvable in the first place?
Does data have a pattern?
Is the phenomenon repeatable?
What additional data would lift the model performance?

Image from Freepik

Once we have evaluated that the problem can be best solved using ML algorithms, the data science team performs an exploratory data analysis. Many underlying data patterns are uncovered at this stage, highlighting whether the given data is rich in the signal. It also helps create engineered features to speed up the learning process of the algorithm.

Next, the team builds the first baseline model, often, finding that it is not performing up to the acceptable level. A model whose output is as good as a coin flip adds no value. This is one of the first setbacks, aka lessons, while building ML models.

Organizations may move from one business problem to another, causing fatigue. Still, if the underlying data does not carry a rich signal, no AI algorithm can build upon it. The model must learn the statistical associations from the training data to generalize on unseen data.

Post-Deployment

Despite the trained model showing promising results on the validation set, in line with the qualifying business criteria, such as 70% precision, fatigue can still arise if the model fails to perform adequately in the production environment.

This type of AI fatigue is called the post-deployment phase.

Myriad reasons could lead to deteriorated performance, where poor data quality is the most common issue plaguing the model. It limits the model’s ability to accurately predict the target response in the absence of crucial attributes.

Consider when one of the essential features, which was only 10% missing in training data, now becomes null 50% of the time in the production data, leading to erroneous predictions. Such iterations and efforts to ensure consistently performing models build fatigue in the data scientists and business teams, thereby eroding confidence in the data pipelines and risking the investments made into the project.

Data Governance is Key!

Robust data governance measures are critical in tackling both types of AI fatigue. Given that the data is at the core of ML models, signal-rich, error-free, and high-quality data are a must for the success of an ML project. Addressing AI fatigue requires a strong focus on data governance. So, we must work rigorously to ensure the right data quality, laying the groundwork to build state-of-the-art models and deliver trustworthy business insights.

Data Quality

Data quality, the key to thriving data governance, is a critical success factor for machine learning algorithms. Organizations must invest in data quality, such as publishing reports to the data consumers. In data science projects, think of what happens when the bad quality data makes its way to the models, which can lead to poor performance.

Only during the error analysis would the teams get to identify the data quality concerns, which, when sent to be fixed upstream, end up causing fatigue among the teams.

Clearly, it is not just the effort expended, but a lot of time is lost until the right data starts to pipe in.

Hence, it is always advised to fix data issues at source to prevent such time-consuming iterations. Eventually, the published data quality reports allude to the data science team (or, for that matter, any other downstream users and data consumers) with an understanding of the acceptable quality of the incoming data.

Without data quality and governance measures, data scientists would get overburdened with data issues, contributing to unsuccessful models driving AI fatigue.

Closing Remarks

The post highlighted the two stages at which AI fatigue sets in and presented how data governance measures such as data quality reports can be an enabler to building trustworthy and robust models.

By establishing a solid foundation through data governance, organizations can build a roadmap to successful and seamless AI development and adoption, instilling enthusiasm.

To ensure the post gives a holistic overview of varied ways of addressing AI fatigue, I also emphasize the role of organizational culture, which, combined with other best practices like data governance, will enable and empower data science teams to build meaningful AI contributions sooner and faster.

Vidhi Chugh is an AI strategist and a digital transformation leader working at the intersection of product, sciences, and engineering to build scalable machine learning systems. She is an award-winning innovation leader, an author, and an international speaker. She is on a mission to democratize machine learning and break the jargon for everyone to be a part of this transformation.