7 Cognitive Biases That Affect Your Data Analysis (and How to Overcome Them)

What are the most important cognitive biases, and how do you overcome them to make your data analysis as objective as possible?

By Nate Rosidi, KDnuggets Market Trends & SQL Content Specialist on June 3, 2025 in Data Science

Image by Author | Canva

Humans can never be completely objective. This means that the insights from the analysis can easily fall victim to a standard human feature: cognitive biases.

I’ll focus on the seven that I find most impactful in data analysis. It’s important to be aware of them and work around them, which you’ll learn in the following several minutes.

1. Confirmation Bias

Confirmation bias is the tendency to search for, interpret, and remember the information that confirms your already existing beliefs or conclusions.

How it shows up:

Interpreting ambiguous or noisy data as a confirmation of your hypothesis.
Cherry-picking data by filtering it to highlight favourable patterns.
Not testing alternative explanations.
Framing reports to make others believe that you want them to, instead of what the data actually shows.

How to overcome it:

Write neutral hypotheses: Ask “How do conversion rates differ across devices and why?” instead of “Do mobile users convert less?”
Test competing hypotheses: Always ask what else could explain the pattern, other than your initial conclusion.
Share your early findings: Let your colleagues critique the interim analysis results and the reasoning behind them.

Example:

Campaign	Channel	Conversions
A	Email	200
B	Social	60
C	Email	150
D	Social	40
E	Email	180

This dataset seems to show that email campaigns perform better than social ones. To overcome this bias, don’t approach the analysis with “Let’s prove email performs better than social”.

Keep your hypotheses neutral. Also, test for statistical significance, such as differences in audience, campaign type, or duration.

2. Anchoring Bias

This bias is reflected in relying too heavily on the first piece of information you receive. In data analysis, this is typically some early metric, despite the metric being completely arbitrary or outdated.

How it shows up:

An initial result defines your expectations, even if it’s a fluke based on a small sample.
Benchmarking against historical data without context and accounting for the changes in the meantime.
Overvaluing the first week/month/quarter performance and assuming success despite drops in later periods.
Fixating on legacy KPI, even though the context has changed.

How to overcome it:

Delay your judgment: Avoid setting benchmarks too early in the analysis. Explore the full dataset first and understand the context of what you’re analyzing.
Look at distributions: Don’t stick to one point and compare the averages. Use distributions to understand the range of past performances and typical variations.
Use dynamic benchmarks: Don’t stick with the historical benchmarks. Adjust them to reflect the current context
Baseline flexibility: Don’t compare your results to a single number, but to multiple reference points.

Example:

Month	Conversion Rate
January	10%
February	9.80%
March	9.60%
April	9.40%
May	9.20%
June	9.20%

Any dip below the first-ever benchmark of 10% might be interpreted as poor performance.

Overcome the bias by plotting the last 12 months and adding median conversion rate, year-over-year seasonality, and confidence intervals or standard deviation. Update benchmarks and segment data for deeper insights.

3. Availability Bias

Availability bias is the tendency to give more weight to recent or easily accessible data, regardless of whether it’s representative or relevant for your analysis.

How it shows up:

Overreacting to dramatic events (e.g, sudden outage) and assuming they reflect a broader pattern.
Basing analysis on the most easily accessible data, without digging deeper into archives or raw logs.

How to overcome it:

Use historical data: Compare unusual patterns with historical data to see if this pattern is actually new or if it happens often.
Include context in your reports: Use your reports and dashboards to show current trends within a context by showing, for example, rolling averages, historical ranges, and confidence intervals.

Example:

Week	Reported Bug Volume
Week 1	4
Week 2	3
Week 3	3
Week 4	25
Week 5	2

A major outage in Week 4 could lead to over-fixating on system reliability. The event is recent, so it’s easy to recall it and overweight it. Overcome the bias by showing this outlier within longer-term patterns and seasonalities.

4. Selection Bias

This is a distortion that happens when your data sample doesn’t accurately represent the full population you’re trying to analyze. With such a deficient sample, you might easily draw conclusions that might be true for the sample, but not for the whole group.

How it shows up:

Analyzing only users who completed a form or survey.
Ignoring users who bounced, churned, or didn’t engage.
Not questioning how your data sample was generated.

How to overcome it:

Think about what is missing: Instead of only focusing on who or what you included in your sample, think about who was excluded and if this absence might skew your results. Check your filters.
Include dropout and non-response data: These are “silent signals” that can be very informative. They are sometimes telling a more complete story than active data.
Break results down by subgroups: For example, compare NPS scores by user activity levels or funnel completion stages to check for bias.
Flag limitations and limit your generalizations: If your results only apply to a subset, label them as such, and don’t use them to generalize to your entire population.

Example:

Customer ID	Submitted Survey	Satisfaction Score
1	Yes	10
2	Yes	9
3	Yes	9
4	No	-
5	No	-

If you include only users who submitted the survey, the average satisfaction score might be inflated. Other users might be so unsatisfied that they didn’t even bother to submit the survey. Overcome this bias by analyzing the response rate and non-respondents. Use churn and usage patterns to get a full picture.

5. Sunk Cost Fallacy

This is a tendency to continue with an analysis or a decision simply because you’ve already invested significant time and effort into it, even though it makes no sense to continue.

How it shows up:

Sticking with an inadequate dataset because you’ve already cleaned it.
Running an A/B test longer than needed, hoping for statistical significance to occur that never will.
Defending a misleading insight simply because you’ve already shared it with stakeholders and don’t want to backtrack.
Sticking with tools or methods because you’re already in an advanced stage of an analysis, even though using other tools or methods might be better in the long term.

How to overcome it:

Focus on quality, not past effort: Always ask yourself, would you choose the same approach if you started the analysis again?
Use checkpoints: In your analysis, use checkpoints where you’ll stop and evaluate whether the work you’ve done so far and what you plan to do still gets you in the right direction.
Get comfortable with starting over: No, starting over is not admitting failure. If it’s more pragmatic to start all over, then it’s a sign of critical thinking.
Communicate honestly: It’s better to be honest, start all over again, ask for more time, and deliver a good quality analysis, than save time by providing flawed insights. Quality wins over speed.

Example:

Week	Data Source	Rows Imported	% NULLs in Columns	Analysis Time Spent
1	CRM_export_v1	20,000	40%	10
2	CRM_export_v1	20,000	40%	8
3	CRM_export_v2	80,000	2%	0

The data shows that an analyst spent 18 hours analyzing low-quality and incomplete data, but zero hours when cleaner and more complete data arrived in Week 3. Overcome the fallacy by defining acceptable NULL thresholds and building in 1-2 checkpoints to reassess your initial analysis plan.

Here’s a chart showing a checkpoint that should’ve triggered reassessment.

6. Outlier Bias

Outlier bias means you give too much importance to extreme or unusual data points. You treat them as they demonstrate trends or typical behavior, but they are nothing but exceptions.

How it shows up:

A single big-spending customer inflates the average revenue per user.
A one-time traffic increase from a viral post is mistaken as a sign of a future trend.
Performance targets are raised based on last month’s exceptional campaign.

How to overcome it:

Avoid averages: Avoid averages when dealing with skewed data; they are less sensitive to extremes. Instead, use medians, percentiles, or trimmed means.
Use distribution: Show distributions on histograms, boxplots, and scatter plots to see where the outliers are.
Segment your analysis: Treat outliers as a distinct segment. If they are important, analyze them separately from the general population.
Set thresholds: Decide on what is an acceptable range for key metrics and exclude outliers outside those bounds.

Example:

Customer ID	Purchase Value
1	$50
2	$80
3	$12,000
4	$75
5	$60

The customer 5 inflates the average purchase value, which is. This could mislead the company to increase the prices. Instead of the average ($2,453), use median ($75) and IQR.

Analyze the outlier separately and see if it can belong to a separate segment.

7. Framing Effect

This cognitive bias leads to interpreting the same data differently, depending on how it’s presented.

How it shows up:

Intentionally choosing the positive or negative point of view
Using chart scales that exaggerate or understate change.
Using percentages without absolute numbers to exaggerate or understate change.
Choosing benchmarks that favour your narrative.

How to overcome it:

Show relative and absolute metrics.
Use consistent scales in charts.
Label clearly and neutrally.

Example:

Experiment Group	Users Retained After 30 Days	Total Users	Retention Rate
Control Group	4,800	6,000	80%
Test Group	4,350	5,000	87%

You can frame this data as “The new onboarding flow improved retention by 7 percentage points.” and “450 fewer users were retained”. Overcome the bias by presenting both sides and showing absolute and relative values.

Conclusion

In data analysis, cognitive biases are a bug, not a feature.

The first step to lessening them is being aware of what they are. Then you can apply certain strategies to mitigate those cognitive biases and keep your data analysis as objective as possible.

Nate Rosidi is a data scientist and in product strategy. He's also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.

7 Cognitive Biases That Affect Your Data Analysis (and How to Overcome Them)

1. Confirmation Bias

2. Anchoring Bias

3. Availability Bias

4. Selection Bias

5. Sunk Cost Fallacy

6. Outlier Bias

7. Framing Effect

Conclusion

More On This Topic

Latest Posts

Top Posts