5 Excel AI Lessons I Learned the Hard Way

This article transforms the unwelcome experiences into five comprehensive frameworks that will elevate your Excel-based machine learning work.



5 Excel AI Lessons I Learned the Hard Way
Image by Editor

 

Introduction

 
For many organizations, particularly those in regulated industries or with limited technical infrastructure, Excel and its XLMiner add-in serve as the primary platform for predictive modeling and machine learning workflows.

Yet Excel's accessibility masks a critical gap: the difference between running models and building trustworthy analytics systems. Working on a loan approval prediction project, I discovered that Excel-based machine learning fails not due to algorithmic limitations, but due to some practices that are frequently overlooked.

This article transforms the unwelcome experiences into five comprehensive frameworks that will elevate your Excel-based machine learning work. 

 

Lesson 1: Multiple Methods for Outlier Detection

 
Outlier handling is more art than science, and premature removal can eliminate legitimate extreme values that carry important information. In one case, all residential asset values above the 95th percentile were removed using a simple IQR calculation, under the assumption they were errors. Later analysis revealed the removal of legitimate ultra-high-value properties, a relevant segment for large loan approvals.

The lesson: Use multiple detection methods and manual review before removal. Create a comprehensive outlier detection framework.

In a new sheet adjacent to the main data, create detection columns:

  • Column A: Original Value (residential_assets_value)
  • Column B: IQR Method
    =IF(A2 > QUARTILE.INC($A$2:$A$4270,3) + 1.5*(QUARTILE.INC($A$2:$A$4270,3)-QUARTILE.INC($A$2:$A$4270,1)), "Outlier_IQR", "Normal")
  • Column C: 3-Sigma Method
    =IF(ABS(A2-AVERAGE($A$2:$A$4270)) > 3*STDEV($A$2:$A$4270), "Outlier_3SD", "Normal")
  • Column D: Percentile Method
    =IF(A2 > PERCENTILE.INC($A$2:$A$4270,0.99), "Outlier_P99", "Normal")
  • Column E: Combined Flag
    =IF(COUNTIF(B2:D2,"Outlier*")>=2, "INVESTIGATE", "OK")
  • Column F: Manual Review [Notes after investigating]
  • Column G: Final Decision [Keep/Remove/Transform]

This multi-method approach revealed patterns in my loan data:

  • Values flagged by all three methods (IQR, 3-sigma, and percentile): Likely errors
  • Values flagged by IQR but not 3-sigma: Legitimate high values in skewed distributions
  • Values flagged only by percentile: The extreme but valid cases I almost lost

The "Manual Review" column is crucial. For each flagged observation, document findings such as: "Legitimate luxury property, verified against public records" or "Likely data entry error, value exceeds market maximum by 10x."

 

Lesson 2: Always Set Random Seeds

 
Few experiences are more frustrating than presenting excellent model results, then being unable to reproduce those exact numbers when preparing the final report. This scenario occurred with a classification tree model: One day's validation accuracy was 97.3%, but next day's was 96.8%. The difference seems small, but it undermines credibility. Makes the audience wonder which number is real and how much can this analysis be trusted.

The lesson: The culprit is random partitioning without a fixed seed. Most machine learning algorithms involve randomness at some stage.

  • Data partitioning: Which observations go into training vs. validation vs. test sets
  • Neural networks: Initial weight randomization
  • Some ensemble methods: Random feature selection

XLMiner uses random processes for partitioning the data. Running the same model twice with identical parameters yields slightly different results because the training/validation split differs each time.

The solution is simple but non-obvious. When using XLMiner's partition functionality (found in most model dialogs):

  1. Check the box labeled "Set seed" (it's unchecked by default)
  2. Enter a specific integer: 12345, 42, 2024, or any memorable number
  3. Document this seed value in the Model Log

Now, every time the model is run with this seed:

  • Identical training/validation/test splits
  • Identical model performance metrics
  • Identical predictions for the same observations
  • Perfect reproducibility

Here is an example from the loan approval dataset without seed (three runs of identical logistic regression):

  • Run 1: Validation Accuracy = 92.4%, F1 = 0.917
  • Run 2: Validation Accuracy = 91.8%, F1 = 0.923
  • Run 3: Validation Accuracy = 92.1%, F1 = 0.919

And with with seed=12345 (three runs of identical logistic regression):

  • Run 1: Validation Accuracy = 92.1%, F1 = 0.928
  • Run 2: Validation Accuracy = 92.1%, F1 = 0.928
  • Run 3: Validation Accuracy = 92.1%, F1 = 0.928

The difference matters enormously for credibility. When tasked with recreating the analysis, it can be done with confidence, knowing the numbers will match.

Important caveat: The seed controls randomness in partitioning and initialization, but it doesn't make the analysis immune to other changes. If data is modified (adding observations, changing transformations) or model parameters are adjusted, results will still differ, as they should.

 

Lesson 3: Proper Data Partitioning: The Three-Way Split

 
Related to reproducibility is partition strategy. XLMiner's default settings create a 60/40 training/validation split. This seems reasonable until the question arises: where's the test set?

A common mistake involves building a neural network, tuning it based on validation performance, then reporting those validation metrics as final results.

The lesson: Without a separate test set, the optimization occurs directly on the data being reported, inflating performance estimates. The correct partitioning strategy uses three sets.

1. Training Set (50% of Data)

  • Where the model learns patterns
  • Used to fit parameters, coefficients, or weights
  • For the loan dataset: ~2,135 observations

2. Validation Set (30% of Data)

  • For model selection and hyperparameter tuning
  • Used to compare different models or configurations
  • Helps select the best pruned tree, optimal cutoff value, or ideal neural network architecture
  • For the loan dataset: ~1,280 observations

3. Test Set (20% of Data)

  • "Final exam" - only score once
  • Used ONLY after all modeling decisions are complete
  • Provides unbiased estimate of real-world performance
  • For the loan dataset: ~854 observations

Critical rule: Never iterate on test set performance. The moment a model is selected because "it performs better on the test set," that test set becomes a second validation set, and performance estimates become biased.

This is my workflow now:

  1. Set seed to 12345
  2. Partition 50/30/20 (training/validation/test)
  3. Build multiple model variants, evaluating each on validation set only
  4. Select the best model based on validation performance and business requirements
  5. Score the test set exactly once with the chosen model
  6. Report test set performance as the expected real-world outcome

Here is an example from the loan approval project:

Model Version Training Acc Validation Acc Test Acc Selected?
Logistic Regression (all vars) 90.6% 89.2% Not scored No
Logistic Regression (stepwise) 91.2% 92.1% Not scored No
Classification Tree (depth=7) 98.5% 97.3% Not scored Yes
Classification Tree (depth=5) 96.8% 96.9% Not scored No
Neural Network (7 nodes) 92.3% 90.1% Not scored No

 

After selecting the Classification Tree (depth=7) based on validation performance, the test set was scored exactly once: 97.4% accuracy. This test accuracy represents the expected production performance.

 

Lesson 4: The Training/Validation Gap: Spotting Overfitting Before It Hurts

 
The problem: Initial look at the classification tree results from the project report seem promising.

Training Data Performance:

  • Accuracy: 98.45%
  • Precision: 99%
  • Recall: 96%
  • F1 Score: 98.7%

The model appeared successful thus far until the focus got shifted to the validation results.

Validation Data Performance:

  • Accuracy: 97.27%
  • Precision: 98%
  • Recall: 94%
  • F1 Score: 97.3%

The difference seemed small, just 1.18% accuracy difference. But determining whether the gap constituted a problem required a systematic framework.

The lesson: It's critical to understand when the model memorizes instead of learns.

The practical solution: Create an overfitting monitor. Build a simple but systematic comparison sheet that makes overfitting obvious.

Step 1: Create the Comparison Framework

Here is a model performance comparison in the "Overfitting_Monitor" sheet:

Metric Training Validation Gap Gap % Status
Accuracy 98.45% 97.27% 1.18% 1.20% ✓ Good
Precision 99.00% 98.00% 1.00% 1.01% ✓ Good
Recall 96.27% 94.40% 1.87% 1.94% ✓ Good
F1 Score 98.76% 97.27% 1.49% 1.51% ✓ Good
Specificity 96.56% 92.74% 3.82% 4.06% ? Watch

 

And here are the interpretation rules:

  • Gap < 3%: ✅ Good - Model generalizing well
  • Gap 3-5%: ❓ Watch - Acceptable but monitor closely
  • Gap 5-10%: ⚠️ Concerning - Possible overfit, consider simplification
  • Gap > 10%: ❌ Problem - Definite overfit, must address

And this is the detailed analysis:

  • Overall Assessment: GOOD
  • Reasoning: All major metrics within 2% gap. Specificity gap slightly higher but still acceptable. Model appears to generalize well.
  • Recommendation: Proceed with test set evaluation.

Step 2: Add the calculation formulas

Cell: Gap (for Accuracy)
=[@Training] - [@Validation]

Cell: Gap % (for Accuracy)
=([@Training] - [@Validation]) / [@Training]

Cell: Status (for Accuracy)

=IF([@[Gap %]]<0.03, "✓ Good",
    IF([@[Gap %]]<0.05, "? Watch",
       IF([@[Gap %]]<0.10, "⚠ Concerning", "✗ Problem")))

 

Step 3: Create a Visual Overfitting Chart

Build a side-by-side bar chart comparing training vs. validation for each metric. This makes patterns instantly visible:

 
Create a Visual Overfitting Chart
 

When bars are close, the model generalizes well. When training bars are much longer than validation bars, there is overfitting.

Comparing Across Different Models

The real value comes from comparing overfitting patterns across the model options. Here is the comparison for the "Model_Overfitting_Comparison" sheet:

Model Train Acc Val Acc Gap Overfitting Risk
Logistic Regression 91.2% 92.1% -0.9% Low (negative gap)
Classification Tree 98.5% 97.3% 1.2% Low
Neural Network (5 nodes) 90.7% 89.8% 0.9% Low
Neural Network (10 nodes) 95.1% 88.2% 6.9% High – Reject this
Neural Network (14 nodes) 99.3% 85.4% 13.9% Very High – Reject this

 

Interpretation: The neural network with 10+ nodes is clearly overfitting. Despite high training accuracy (99.3%), validation accuracy drops to 85.4%. The model memorized training data patterns that don't generalize.

Best choice: Classification tree

  • High performance (97.3% validation)
  • Minimal overfitting (1.2% gap)
  • Good balance of complexity and generalization

Here are some simple ways to reduce overfitting when spotted:

  • For Classification Trees: Reduce maximum depth or increase minimum samples per node
  • For Neural Networks: Reduce number of nodes or layers
  • For Logistic Regression: Remove variables or use stepwise selection
  • For All Models: Add more training data if possible

 

Lesson 5: Implement Data Validation for Categorical Variables

 
Data entry errors are silent killers of machine learning projects. A single typo, such as "gradute" instead of "graduate", creates a third category in what should be a binary variable. The model now has an unexpected feature value it's never seen during training, potentially causing errors during deployment or, worse, silently producing incorrect predictions.

The prevention: Excel's data validation feature. Here's the implementation protocol for categorical variables:

In a hidden sheet (name it "Validation_Lists"), create lists of valid values:

  • For education: Create a column with entries "Graduate" and "Not Graduate"
  • For self-employment: Create a column with entries "Yes" and "No"
  • For loan status: Create a column with entries "Approved" and "Rejected"

In the data entry sheet:

  • Select the entire column for a categorical variable (e.g., column containing education data)
  • Data → Data Validation → Settings tab
  • Allow: List
  • Source: Navigate to the hidden validation sheet and select the appropriate list
  • Error Alert tab: Style = Stop, with a clear message: "Only 'Graduate' or 'Not Graduate' are valid entries"

Now it's impossible to enter invalid values. Users see a dropdown with valid choices, eliminating typographical errors entirely.

For numeric variables with known ranges, apply similar validation to prevent impossible values:

  • Credit scores: Must be between 300 and 900
  • Loan term: Must be between 1 and 30 years
  • Annual income: Must be greater than 0

Select the column, apply data validation, set:

  • Allow: Whole Number (or Decimal)
  • Data: between
  • Minimum: 300 (for credit scores)
  • Maximum: 900

 

Final Thoughts

 
Here is a summary of the lessons outlined in the article.

 

5 Excel AI Lessons I Learned the Hard Way
5 Excel AI Lessons I Learned the Hard Way (click to enlarge)

 

The five practices in this article — multi-method outlier detection, setting random seeds, three-way data partitioning, monitoring the training-validation gap, and implementing data validation — share a common thread: they're all simple to implement but devastating to omit.

None of these practices require advanced statistical knowledge or complex programming. They don't demand additional software or expensive tools. Excel XLMiner is a powerful tool for accessible machine learning.
 
 

Rachel Kuznetsov has a Master's in Business Analytics and thrives on tackling complex data puzzles and searching for fresh challenges to take on. She's committed to making intricate data science concepts easier to understand and is exploring the various ways AI makes an impact on our lives. On her continuous quest to learn and grow, she documents her journey so others can learn alongside her. You can find her on LinkedIn.


Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy


Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy

Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy

No, thanks!