- Popular Machine Learning Interview Questions - Jan 20, 2021.
Get ready for your next job interview requiring domain knowledge in machine learning with answers to these eleven common questions.
- Working With Sparse Features In Machine Learning Models - Jan 12, 2021.
Sparse features can cause problems like overfitting and suboptimal results in learning models, and understanding why this happens is crucial when developing models. Multiple methods, including dimensionality reduction, are available to overcome issues due to sparse features.
- Can you trust AutoML? - Dec 23, 2020.
Automated Machine Learning, or AutoML, tries hundreds or even thousands of different ML pipelines to deliver models that often beat the experts and win competitions. But, is this the ultimate goal? Can a model developed with this approach be trusted without guarantees of predictive performance? The issue of overfitting must be closely considered because these methods can lead to overestimation -- and the Winner's Curse.
- Dark Data: Why What You Don’t Know Matters - Dec 7, 2020.
In his latest book, a leading statistician Dr. David Hand explores how we can be blind to missing or unseen data and how, in our rush to be a data-driven society, we might be missing things that matter, leading to dangerous decisions that can sometimes have disastrous consequences. Download this free chapter now.
- What an Argentine Writer and a Hungarian Mathematician Can Teach Us About Machine Learning Overfitting - Sep 21, 2020.
This article presents some beautiful ideas about intelligence and how they related to modern machine learning.
- 6 Common Mistakes in Data Science and How To Avoid Them - Sep 10, 2020.
As a novice or seasoned Data Scientist, your work depends on the data, which is rarely perfect. Properly handling the typical issues with data quality and completeness is crucial, and we review how to avoid six of these common scenarios.
- 4 ways to improve your TensorFlow model – key regularization techniques you need to know - Aug 27, 2020.
Regularization techniques are crucial for preventing your models from overfitting and enables them perform better on your validation and test sets. This guide provides a thorough overview with code of four key approaches you can use for regularization in TensorFlow.
- Fighting Overfitting in Deep Learning - Dec 27, 2019.
This post outlines an attack plan for fighting overfitting in neural networks.
- 5 Techniques to Prevent Overfitting in Neural Networks - Dec 6, 2019.
In this article, I will present five techniques to prevent overfitting while training neural networks.
- Reproducibility, Replicability, and Data Science - Nov 19, 2019.
As cornerstones of scientific processes, reproducibility and replicability ensure results can be verified and trusted. These two concepts are also crucial in data science, and as a data scientist, you must follow the same rigor and standards in your projects.
- Generalization in Neural Networks - Nov 18, 2019.
When training a neural network in deep learning, its performance on processing new data is key. Improving the model's ability to generalize relies on preventing overfitting using these important methods.
- 6 bits of advice for Data Scientists - Sep 25, 2019.
As a data scientist, you can get lost in your daily dives into the data. Consider this advice to be certain to follow in your work for being diligent and more impactful for your organization.
- The Hidden Risk of AI and Big Data - Sep 20, 2019.
With recent advances in AI being enabled through access to so much “Big Data” and cheap computing power, there is incredible momentum in the field. Can big data really deliver on all this hype, and what can go wrong?
- Common Machine Learning Obstacles - Sep 9, 2019.
In this blog, Seth DeLand of MathWorks discusses two of the most common obstacles relate to choosing the right classification model and eliminating data overfitting.
- Top KDnuggets tweets, Aug 14-20: Researcher reproduced 130 research papers on “predicting the stock market”, coded them from scratch. - Aug 21, 2019.
Also: For data pros only - An SQL Query walks into a bar and sees two tables; Deep Learning for NLP: Creating a Chatbot with Keras!; 12 NLP Researchers, Practitioners & Innovators You Should Be Following; Wanting to be even more marketable as a data scientist? Check out these trends in the skills employers are looking for today
- Can we trust AutoML to go on full autopilot? - Jul 31, 2019.
We put an AutoML tool to the test on a real-world problem, and the results are surprising. Even with automatic machine learning, you still need expert data scientists.
- Careful! Looking at your model results too much can cause information leakage - May 24, 2019.
We all are aware of the issue of overfitting, which is essentially where the model you build replicates the training data results so perfectly its fitted to the training data and does not generalise to better represent the population the data comes to, with catastrophic results when you feed in new data and get very odd results.
- Preparing for the Unexpected - Feb 28, 2019.
In some domains, new values appear all the time, so it's crucial to handle them in a good way. Using deep learning, one can learn a special Out-of-Vocabulary embedding for these new values. But how can you train this embedding to generalize well to any unseen value? We explain one of the methods employed at Taboola.
- How To Fine Tune Your Machine Learning Models To Improve Forecasting Accuracy - Jan 23, 2019.
We explain how to retrieve estimates of a model's performance using scoring metrics, before taking a look at finding and diagnosing the potential problems of a machine learning algorithm.
- Why Ice Cream Is Linked to Shark Attacks – Correlation/Causation Smackdown - Jan 19, 2019.
Why are soda and ice cream each linked to violence? This article delivers the final word on what people mean by "correlation does not imply causation."
- Why Vegetarians Miss Fewer Flights – Five Bizarre Insights from Data - Jan 12, 2019.
A frenzy of number-crunching is churning out a heap of insights that are colorful, sometimes surprising, and often valuable. We explain how this works, and investigate five bizarre discoveries found in data.
- The brain as a neural network: this is why we can’t get along - Dec 19, 2018.
This article sets out to answer the question: what insights can we gain about ourselves by thinking of the brain as a machine learning model?
- Labeling Unstructured Text for Meaning to Achieve Predictive Lift - Oct 31, 2018.
In this post, we examine several advance NLP techniques, including: labeling nouns and noun phrases for meaning, labeling (most often) adverbs and adjectives for sentiment, and labeling verbs for intent.
- Improving the Performance of a Neural Network - May 30, 2018.
There are many techniques available that could help us achieve that. Follow along to get to know them and to build your own accurate neural network.
- 12 Useful Things to Know About Machine Learning - Apr 12, 2018.
This is a summary of 12 key lessons that machine learning researchers and practitioners have learned include pitfalls to avoid, important issues to focus on and answers to common questions.
Pages: 1 2
- 8 Common Pitfalls That Can Ruin Your Prediction - Mar 21, 2018.
A good prediction can help your work and make it easier. But how can you be sure that your prediction is good? Here are some common pitfalls that you should avoid.
- KDnuggets™ News 18:n06, Feb 7: 5 Fantastic Practical Machine Learning Resources; 8 Must-Know Neural Network Architectures - Feb 7, 2018.
5 Fantastic Practical Machine Learning Resources; The 8 Neural Network Architectures Machine Learning Researchers Need to Learn; Generalists Dominate Data Science; Avoid Overfitting with Regularization; Understanding Learning Rates and How It Improves Performance in Deep Learning
- Avoid Overfitting with Regularization - Feb 2, 2018.
This article explains overfitting which is one of the reasons for poor predictions for unseen samples. Also, regularization technique based on regression is presented by simple steps to make it clear how to avoid overfitting.
- Regularization in Machine Learning - Jan 10, 2018.
Regularization is a technique that helps to avoid overfitting and also make a predictive model more understandable.
- How To Debug Your Approach To Data Analysis - Dec 29, 2017.
Seven common biases that influence how we understand, use, and interpret the world around us.
Pages: 1 2
- 4 Common Data Fallacies That You Need To Know - Dec 5, 2017.
In this post you will find a list of common the data fallacies that lead to incorrect conclusions and poor decision-making using data. Here you will find great resources and information so that you can always be reminded of these fallacies when you’re working with data.
- Are Scientists Doing Too Much Research? - Nov 24, 2017.
At the heart of this reproducibility problem is the statistical inference methods used to validate research findings—specifically the concept of “statistical significance.”
- Stop Doing Fragile Research - Nov 17, 2017.
If you develop methods for data analysis, you might only be conducting gentle tests of your method on idealized data. This leads to “fragile research,” which breaks when released into the wild. Here, I share 3 ways to make your methods robust.
- Understanding overfitting: an inaccurate meme in Machine Learning - Aug 23, 2017.
Applying cross-validation prevents overfitting is a popular meme, but is not actually true – it more of an urban legend. We examine what is true and how overfitting is different from overtraining.
- Making Predictive Models Robust: Holdout vs Cross-Validation - Aug 11, 2017.
The validation step helps you find the best parameters for your predictive model and prevent overfitting. We examine pros and cons of two popular validation strategies: the hold-out strategy and k-fold.
- The Truth About Bayesian Priors and Overfitting - Jul 25, 2017.
Many of the considerations we will run through will be directly applicable to your everyday life of applying Bayesian methods to your specific domain.
- How to Lie with Data - Apr 20, 2017.
We expect data scientists to be objective, but intentionally or not, they can produce results that mislead. We examine three common types of “lies” that Data Scientists should be aware of.
- Proxy Indicators: beware of spurious claims - Mar 16, 2017.
Beware of online and market research studies which can lead to false or spurious claims. We examine several notable examples including Google Street View and Argentina inflation.
- 17 More Must-Know Data Science Interview Questions and Answers, Part 2 - Feb 22, 2017.
The second part of 17 new must-know Data Science Interview questions and answers covers overfitting, ensemble methods, feature selection, ground truth in unsupervised learning, the curse of dimensionality, and parallel algorithms.
- 17 More Must-Know Data Science Interview Questions and Answers - Feb 15, 2017.
17 new must-know Data Science Interview questions and answers include lessons from failure to predict 2016 US Presidential election and Super Bowl LI comeback, understanding bias and variance, why fewer predictors might be better, and how to make a model more robust to outliers.
Pages: 1 2
- Sound Data Science: Avoiding the Most Pernicious Prediction Pitfall - Jan 5, 2017.
Data science and predictive analytics can provide huge value, but they can mislead and backfire if not used with fail-safe measures. The author gives examples of such problems and provides guidelines to avoid them.
- 4 Reasons Your Machine Learning Model is Wrong (and How to Fix It) - Dec 21, 2016.
This post presents some common scenarios where a seemingly good machine learning model may still be wrong, along with a discussion of how how to evaluate these issues by assessing metrics of bias vs. variance and precision vs. recall.
- Why We Need Data Science - Nov 26, 2016.
A gentle reminder as to why we need Data Science, reasons for which even you may have been guilty of offending at some point. A basic topic, to be sure, making it all the more important.
- Data Science Basics: 3 Insights for Beginners - Sep 22, 2016.
For data science beginners, 3 elementary issues are given overview treatment: supervised vs. unsupervised learning, decision tree pruning, and training vs. testing datasets.
- A Neat Trick to Increase Robustness of Regression Models - Aug 22, 2016.
Read this take on the validity of choosing a different approach to regression modeling. Why isn't L1 norm used more often?
- The Fallacy of Seeing Patterns - Jul 26, 2016.
Analysts are often on the lookout for patterns, often relying on spurious patterns. This post looks at some spurious patterns in univariate, bivariate & multivariate analysis.
- Data Mining Most Vexing Problem Solved, or is this drug REALLY working? - Jul 15, 2016.
This is a summary of the basic principle behind a new paper on multiple test correction for streams and cascades of statistical hypothesis tests, showing how to strictly control the risk of making a mistake over a series of tests and draw appropriate conclusions.
- Troubleshooting Neural Networks: What is Wrong When My Error Increases? - May 13, 2016.
An overview of some of the things that could lead to an increased error rate in neural network implementations.
- The “Thinking” Part of “Thinking Like A Data Scientist” - Apr 26, 2016.
People have a tendency to blindly trust claims from any source that they deem credible, whether or not it conflicts with their own experiences or common sense. Basic stats - common sense = dangerous conclusions viewed as fact.
- When Good Advice Goes Bad - Mar 14, 2016.
Consider these 4 examples of good statistical advice which, when misused, can go bad.
- The Mirage of a Citizen Data Scientist - Mar 1, 2016.
The term "citizen data scientist" has been irritating me recently. I explain why I think it both a bad term and a bad idea, and what we need instead.
- 21 Must-Know Data Science Interview Questions and Answers, part 2 - Feb 20, 2016.
Second part of the answers to 20 Questions to Detect Fake Data Scientists, including controlling overfitting, experimental design, tall and wide data, understanding the validity of statistics in the media, and more.
Pages: 1 2 3
- Data scientists keep forgetting the one rule - Feb 2, 2016.
“Correlation does not imply causation”. Yet data scientists often confuse the two, succumbing to the temptation to over-interpret. And that can lead us to make some really bad decisions from data.
- On Political Economy and Data Science: When A Discipline Is Not Enough - Nov 18, 2015.
Most non-trivial Data Science applications are interdisciplinary requiring collaboration across disciplines. We are just beginning to understand the nature of interdisciplinarity in Data Science and the risks of misunderstanding.
- H2O World 2015 – Day 1 Highlights - Nov 16, 2015.
Highlights from talks and tutorials delivered by machine learning experts at H2O World 2015 held in Mountain View.
- Are you trying to acquire Machine Learning Skills? - Sep 16, 2015.
Embarking on a journey through the lands of machine learning? Here are few important lessons like Feature Engineering, Model tuning, Overfitting, Ensembling etc. which you should keep in mind along the way.
- KDnuggets™ News 15:n27, Aug 19: Data Science MS/Certificates Online; Big idea to avoid overfitting; Largest Dataset Mined: trends - Aug 19, 2015.
Data Science, Analytics Online Degrees and Certificates; Where is Big Data? For most, Largest Dataset is in laptop-size Gigabytes ; 11 things to know about Sentiment Analysis; Recycling Deep Learning Models with Transfer Learning.
- Top KDnuggets tweets, Aug 11-17: Data Science Breakthrough in avoiding overfitting; Top Big Data, Data Science influencers - Aug 18, 2015.
Understanding #Convolution in #DeepLearning; Top #BigData #DataScience influencers @hmason @hackingdata @kirkdborne @flowingdata; Data Science Breakthrough in avoiding #overfitting: The reusable holdout method; R Programming: Where are 50,000 R programmers?
- Big Idea To Avoid Overfitting: Reusable Holdout to Preserve Validity in Adaptive Data Analysis - Aug 17, 2015.
Big Data makes it all too easy find spurious "patterns" in data. A new approach helps avoid overfitting by using 2 key ideas: validation should not reveal any information about the holdout data, and adding of a small amount of noise to any validation result.
Pages: 1 2
- Overcoming Overfitting with the reusable holdout: Preserving validity in adaptive data analysis - Aug 12, 2015.
Misapplication of statistical data analysis is a common cause of spurious discoveries in scientific research. We demonstrate a new approach for addressing the challenges of adaptivity based on insights from privacy-preserving data analysis.
- Top KDnuggets tweets, Jun 30 – Jul 06: Click Testing Proved that Beards Are Still A Thing; 16 Free #DataScience Books - Jul 12, 2015.
How Screenshot Click Testing Proved that Beards Are Still A Thing; 16 Free #DataScience Books; How to avoid #Overfitting using #Regularization; #DataScience must read: quick puzzle tests your problem solving.
- Surprising Random Correlations - May 14, 2015.
An interesting demo showing how easy it is to find surprising correlations in real data. Is German unemployment rate related to Apple Stock? Is 10-year Treasury rate related to price of Red Winter Wheat? You will be surprised.
- 3 Things About Data Science You Won’t Find In Books - May 11, 2015.
There are many courses on Data Science that teach the latest logistic regression or deep learning methods, but what happens in practice? Data Scientist shares his main practical insights that are not taught in universities.
Pages: 1 2
- KDnuggets™ News 15:n12, Apr 22: Predictive Analytics Future? Top LinkedIn Groups; Preventing Overfitting - Apr 22, 2015.
New Poll: Future of Predictive Analytics? Top LinkedIn Groups for Analytics, Big Data, Data Mining - "Big Bang" to Now; Preventing Overfitting in Neural Networks; Cloud Machine Learning Wars: Amazon vs IBM Watson vs Microsoft Azure.
- Data Science 101: Preventing Overfitting in Neural Networks - Apr 17, 2015.
Overfitting is a major problem for Predictive Analytics and especially for Neural Networks. Here is an overview of key methods to avoid overfitting, including regularization (L2 and L1), Max norm constraints and Dropout.
Pages: 1 2
- 10 Steps to Success in Kaggle Data Science Competitions - Mar 11, 2015.
The author, ranked in top 10 in five Kaggle competitions, shares his 10 steps for success. These also apply to any well-defined predictive analytics or modeling problem with a closed dataset.
Pages: 1 2 3
- 7 common mistakes when doing Machine Learning - Mar 7, 2015.
In statistical modeling, there are various algorithms to build a classifier, and each algorithm makes a different set of assumptions about the data. For Big Data, it pays off to analyze the data upfront and then design the modeling pipeline accordingly.
Pages: 1 2
- Top /r/MachineLearning Posts, Feb 15-21: The Elephant in the Room of ML Research - Feb 24, 2015.
Problems with deep learning papers, Coursera linear algebra courses, Reddit comment visualizations, deep learning lectures, and genetic algorithm introductions make up the top posts this week on /r/MachineLearning.
- 10 things statistics taught us about big data analysis - Feb 10, 2015.
There are 10 ideas in applied statistics are relevant for big data analysis, focusing on prediction accuracy, interactive analysis and more.
- Top stories in January: (Deep Learning Deep Flaws) Deep Flaws; Research Leaders on key trends, papers - Feb 6, 2015.
Research Leaders on Data Science and Big Data key trends, papers; (Deep Learning Deep Flaws) Deep Flaws; Analytics: Five Rules to Cut Through the Hype; 11 Clever Methods of Overfitting and how to avoid them.
- Top stories for Jan 4-10: 11 Clever Methods of Overfitting; Research Leaders on Data Science and Big Data - Jan 11, 2015.
11 Clever Methods of Overfitting and how to avoid them; Causation vs Correlation: Visualization, Statistics, and Intuition; Research Leaders on Data Science and Big Data key trends, top papers; Differential Privacy: How to make Privacy and Data Mining Compatible.
- KDnuggets™ News 15:n01, Jan 7: Clever methods of overfitting; 5 Analytics Rules to cut thru the Hype - Jan 7, 2015.
11 Clever Methods of Overfitting and how to avoid them, Data Mining and Text Analytics of World Cup 2014, iMath Cloud Data Science Platform beta, Platfora CEO on Insightful Analytics for Big Data, and more analytics, big data, data science, and data mining stories.
- Top stories for Dec 28 – Jan 3: What will happen to big data and data science? Analytics: Five Rules to Cut Through the Hype - Jan 4, 2015.
2015 Predictions: What will happen to big data and data science?; Data Mining is LinkedIn Hottest Skill in 2014; Analytics: Five Rules to Cut Through the Hype; 11 Clever Methods of Overfitting and how to avoid them.
- 11 Clever Methods of Overfitting and how to avoid them - Jan 2, 2015.
Overfitting is the bane of Data Science in the age of Big Data. John Langford reviews "clever" methods of overfitting, including traditional, parameter tweak, brittle measures, bad statistics, human-loop overfitting, and gives suggestions and directions for avoiding overfitting.
- LION Intelligent Learning and Optimization News - Nov 26, 2014.
LION intelligent learning and optimization adds full support for Java packages, new visualization neatly explains overfitting, and get "The LION way" book on Kindle (free if you qualify).
- Top KDnuggets tweets, Nov 19-20: 20 Insane Things That Correlate with Each Other - Nov 21, 2014.
Spurious #Correlations - 20 Insane Things That Correlate W/ Each Other; 10 Most Profitable Industries According to #BigData; MIT researchers show 5 clusters are enough for collab filtering; Every publisher now a start-up, says NYT Top Data Scientist.
- Big Data Winter ahead – unless we change course, warns Michael Jordan - Oct 30, 2014.
We have to have error bars around all our predictions, says machine learning expert Michael Jordan. Otherwise it's gambling, and too many failed predictions can lead to big disappointment with Big Data - a Big Data Winter.
- Big Data accelerates medical research? Or not? - Oct 26, 2014.
Take a look at how big data in healthcare brings big opportunities, but along with those opportunities come great risk if statistics aren't carefully applied to those large datasets.
- Top KDnuggets tweets, Oct 17-19: Air traffic analyzed to predict Ebola spread; Cool public data for data science - Oct 20, 2014.
Air traffic data analyzed to predict Ebola spread; Some cool public data sources you can use for your next data science project; Data science can't be point and click ! Finding random correlation is too easy; Bayes Rule in an animated gif.
- Top stories in June: Does Deep Learning Have Deep Flaws? Cartoon: Big Data and World Cup - Jul 3, 2014.
Does Deep Learning Have Deep Flaws? Cartoon: Big Data and World Cup Football; KDnuggets 15th Annual Data Mining Software Poll: RapidMiner Continues To Lead; The Cardinal Sin of Data Mining and Data Science: Overfitting.
- Top stories for Jun 15-21 - Jun 22, 2014.
Does Deep Learning Have Deep Flaws?; Cartoon: Big Data and World Cup Football; Optimizing the Netflix Experience with Data Science; The Cardinal Sin of Data Mining: Overfitting.
- KDnuggets 14:n15, Analytics Software Poll – Analyzed; Cartoon: Big Data and World Cup - Jun 18, 2014.
Also Data Mining Cardinal Sin, KDnuggets Profile, CAP, and more analytics/data mining features, software, opinions, news, webcasts, courses, jobs, academic positions, publications, tweets, and CFP.
- Top KDnuggets tweets, Jun 13-15: Book: Data Classification: Algorithms and Applications - Jun 16, 2014.
Book: Data Classification: Algorithms and Applications; Top 10 Data Analysis Tools for Business; #BigData companies to watch selected by top analytics experts; The Cardinal Sin of Data Mining and Data Science: Overfitting.
- The Cardinal Sin of Data Mining and Data Science: Overfitting - Jun 14, 2014.
Overfitting leads to public losing trust in research findings, many of which turn out to be false. We examine some famous examples, "the decline effect", Miss America age, and suggest approaches for avoiding overfitting.