- 10 Must-Know Statistical Concepts for Data Scientists - Apr 21, 2021.
Statistics is a building block of data science. If you are working or plan to work in this field, then you will encounter the fundamental concepts reviewed for you here. Certainly, there is much more to learn in statistics, but once you understand these basics, then you can steadily build your way up to advanced topics.
- The 8 Basic Statistics Concepts for Data Science - Jun 24, 2020.
Understanding the fundamentals of statistics is a core capability for becoming a Data Scientist. Review these essential ideas that will be pervasive in your work and raise your expertise in the field.
- The Hidden Risk of AI and Big Data - Sep 20, 2019.
With recent advances in AI being enabled through access to so much “Big Data” and cheap computing power, there is incredible momentum in the field. Can big data really deliver on all this hype, and what can go wrong?
- How Concerned Should You be About Predictor Collinearity? It Depends… - Aug 15, 2019.
Predictor collinearity (also known as multicollinearity) can be problematic for your regression models. Check out these rules of thumb about when, and when not, to be concerned.
- From Good to Great Data Science, Part 1: Correlations and Confidence - Feb 5, 2019.
With the aid of some hospital data, part one describes how just a little inexperience in statistics could result in two common mistakes.
- Why Ice Cream Is Linked to Shark Attacks – Correlation/Causation Smackdown - Jan 19, 2019.
Why are soda and ice cream each linked to violence? This article delivers the final word on what people mean by "correlation does not imply causation."
- Every time someone runs a correlation coefficient on two time series, an angel loses their wings - Jun 18, 2018.
We all know correlation doesn’t equal causality at this point, but when working with time series data, correlation can lead you to come to the wrong conclusion.
- 12 Useful Things to Know About Machine Learning - Apr 12, 2018.
This is a summary of 12 key lessons that machine learning researchers and practitioners have learned include pitfalls to avoid, important issues to focus on and answers to common questions.
Pages: 1 2
- Why understanding of truth is important in Data Science? - Jan 1, 2018.
Data Science can be used to discover correlations (What phenomena occurred) but cannot be used to establish causality (Why the phenomena occurred).
- Pros and Pitfalls of Observational Research - May 3, 2017.
Why the connection between beer brand and region? Climate? Tradition? Or simply distribution? Some combination of the three, plus other factors?
- Introduction to Correlation - Feb 22, 2017.
Correlation is one of the most widely used (and widely misunderstood) statistical concepts. We provide the definitions and intuition behind several types of correlation and illustrate how to calculate correlation using the Python pandas library.
- Causation or Correlation: Explaining Hill Criteria using xkcd - Feb 20, 2017.
This is an attempt to explain Hill’s criteria using xkcd comics, both because it seemed fun, and also to motivate causal inference instructures to have some variety in which xkcd comic they include in lectures.
- 9 Bizarre and Surprising Insights from Data Science - Oct 5, 2016.
The petabytes of information currently available to analysts amounts to a boundless playing field of possible truths.
- The Fallacy of Seeing Patterns - Jul 26, 2016.
Analysts are often on the lookout for patterns, often relying on spurious patterns. This post looks at some spurious patterns in univariate, bivariate & multivariate analysis.
- How to Compare Apples and Oranges – Part 1 - Jun 17, 2016.
We are always told that apples and oranges can’t be compared, they are completely different things. Learn as an analyst, how you deal with such difference and make sense of it on a daily basis.
Pages: 1 2
- Eugenics – journey to the dark side at the dawn of statistics - Apr 27, 2016.
Today is the 80th anniversary of the death of Karl Pearson, one of the founding father of statistics (correlation coefficient, principal components, the p-value, and much more). He was also deeply involved with eugenics, a jarring reminder that truth often comes bundled with a measure of darkness.
- Regression & Correlation for Military Promotion: A Tutorial - Apr 13, 2016.
A clear and well-written tutorial covering the concepts of regression and correlation, focusing on military commander promotion as a use case.
Pages: 1 2
- Data scientists keep forgetting the one rule - Feb 2, 2016.
“Correlation does not imply causation”. Yet data scientists often confuse the two, succumbing to the temptation to over-interpret. And that can lead us to make some really bad decisions from data.
- Random vs Pseudo-random – How to Tell the Difference - Oct 26, 2015.
Statistical know-how is an integral part of Data Science. Explore randomness vs. pseudo-randomness in this explanatory post with examples.
- Using Ensembles in Kaggle Data Science Competitions – Part 1 - Jun 25, 2015.
How to win Machine Learning Competitions? Gain an edge over the competition by learning Model Ensembling. Take a look at Henk van Veen's insights about how to get improved results!
Pages: 1 2
- Surprising Random Correlations - May 14, 2015.
An interesting demo showing how easy it is to find surprising correlations in real data. Is German unemployment rate related to Apple Stock? Is 10-year Treasury rate related to price of Red Winter Wheat? You will be surprised.
- Interview: Bill Moreau, USOC on Empowering World’s Best Athletes through Analytics - Mar 26, 2015.
We discuss how United States Olympic Committee uses Big Data, how athletes respond to Analytical insights, integration of sports medicine into sports performance and sports injury.
- Top KDnuggets tweets, Feb 11-12: Automating romance with Eigenfaces; My Brief Guide to Big Data, Predictive Analytics for non-experts - Feb 13, 2015.
Romantic #DataScientist @crockpotveggies automates #Tinder with Eigenfaces; My Brief Guide to Big Data and Predictive Analytics for non-experts; #DataMining finds corruption is correlated with low income, low development MIT; Hitachi buys Pentaho to extend Its #BigData footprint.
- Can noise help separate causation from correlation? - Jan 21, 2015.
How to tell correlation from causation is one of the key problems in data science and Big Data. New Additive Noise Models methods can do it with over 65% accuracy, opening new breakthrough possibilities.
- Top stories for Jan 4-10: 11 Clever Methods of Overfitting; Research Leaders on Data Science and Big Data - Jan 11, 2015.
11 Clever Methods of Overfitting and how to avoid them; Causation vs Correlation: Visualization, Statistics, and Intuition; Research Leaders on Data Science and Big Data key trends, top papers; Differential Privacy: How to make Privacy and Data Mining Compatible.
- Top KDnuggets tweets, Dec 29 – Jan 04: A brilliant way to tell causation from correlation; Machine Learning Experts You Need to Know. - Jan 5, 2015.
SAS is n1 among major BI vendors whose users plan to discontinue use; How #MachineLearning, #BigData, and image recognition could revolutionize search; A brilliant way to tell causation from correlation; Machine Learning Experts You Need to Know: Geoff Hinton, Michael Jordan, Andrew Ng.
- Causation vs Correlation: Visualization, Statistics, and Intuition - Jan 4, 2015.
Visualizations of correlation vs. causation and some common pitfalls and insights involving the statistics are explored in this case study involving stock price time series.
- Top KDnuggets tweets, Dec 22-28: Top 10 Data Science Skills, and How to Learn Them; How to tell correlation from causation - Dec 29, 2014.
Top 10 Data Science Skills, and How to Learn Them; Mathematicians claim to figure out how to tell correlation from causation; Review of #MOOC Learning from Data - the class that changed everything; Free Big Data sources every Data Science enthusiast should know.
- Top KDnuggets tweets, Nov 19-20: 20 Insane Things That Correlate with Each Other - Nov 21, 2014.
Spurious #Correlations - 20 Insane Things That Correlate W/ Each Other; 10 Most Profitable Industries According to #BigData; MIT researchers show 5 clusters are enough for collab filtering; Every publisher now a start-up, says NYT Top Data Scientist.
- Top KDnuggets tweets, Jun 9-10: Numeric Matrix Manipulation: Cheat Sheet; The First Law of Data Science - Jun 11, 2014.
Also - The First Law of Data Science: Do Umbrellas Cause Rain? ; Tell Your Kids to be Data Scientists - Not Doctors; DLib Library for Machine Learning
- The First Law of Data Science: Do Umbrellas Cause Rain? - Jun 9, 2014.
Michael Brodie on the first law of data science, the role of data curation in Big Data analysis, and Thomas Piketty economic theories.
- Interview: Kirk Borne, Data Scientist, GMU on Big Data in Astrophysics and Correlation vs. Causality - May 30, 2014.
We discuss how to build the best data models, significance of correlation and causality in Predictive Analytics, and impact of Big Data on Astrophysics.
- KDnuggets Interview: Juan Miguel Lavista, Microsoft Data Science Team - Apr 30, 2014.
We discuss Randomized Controlled Experiments, common errors during A/B testing, Correlation vs. Causality, Big Data Myths and setting up realistic expectations from Big Data and more...
- Dancing Statistics – who says statistics cannot be fun? - Mar 10, 2014.
Four little dance routines explain statistical concepts of frequency distributions, sampling, standard error, variance, correlation, and correlation != causation. Enjoy!