- Causation or Correlation: Explaining Hill Criteria using xkcd - Feb 20, 2017.
This is an attempt to explain Hill’s criteria using xkcd comics, both because it seemed fun, and also to motivate causal inference instructures to have some variety in which xkcd comic they include in lectures.
Cartoon, Causation, Correlation, Statistics, xkcd
- The Top Predictive Analytics Pitfalls to Avoid - Jan 23, 2017.
Predictive modelling and machine learning are significantly contributing to business, but they can be very sensitive to data and changes in it, which makes it very important to use proper techniques and avoid pitfalls in building data science models.
Bias, Machine Learning, Model Performance, Predictive Analytics, Regularization, Statistics
- 3 methods to deal with outliers - Jan 3, 2017.
In both statistics and machine learning, outlier detection is important for building an accurate model to get good results. Here three methods are discussed to detect outliers or anomalous data instances.
Machine Learning, Outliers, Statistics
- Machine Learning vs Statistics - Nov 29, 2016.
Machine learning is all about predictions, supervised learning, and unsupervised learning, while statistics is about sample, population, and hypotheses. But are they actually that different?
Machine Learning, Statistics
- How Bayesian Inference Works - Nov 15, 2016.
Bayesian inference isn’t magic or mystical; the concepts behind it are completely accessible. In brief, Bayesian inference lets you draw stronger conclusions from your data by folding in what you already know about the answer. Read an in-depth overview here.
Pages: 1 2 3
Bayes Rule, Bayes Theorem, Bayesian, Inference, Statistics
- How Can Lean Six Sigma Help Machine Learning? - Nov 1, 2016.
The data cleansing phase alone is not sufficient to ensure the accuracy of the machine learning, when noise / bias exists in input data. The lean six sigma variance reduction can improve the accuracy of machine learning results.
Data Cleaning, Machine Learning, Predictive Analytics, Statistics
- Data Science Basics: Data Mining vs. Statistics - Sep 28, 2016.
As a beginner I was confused at the relationship between data mining and statistics. This is my attempt to help straighten out this connection for others who may now be in my old shoes.
Beginners, Data Mining, Statistics
- The Great Algorithm Tutorial Roundup - Sep 20, 2016.
This is a collection of tutorials relating to the results of the recent KDnuggets algorithms poll. If you are interested in learning or brushing up on the most used algorithms, as per our readers, look here for suggestions on doing so!
Algorithms, Clustering, Decision Trees, K-nearest neighbors, Machine Learning, PCA, Poll, random forests algorithm, Regression, Statistics, Text Mining, Time Series, Visualization
- A Tutorial on the Expectation Maximization (EM) Algorithm - Aug 25, 2016.
This is a short tutorial on the Expectation Maximization algorithm and how it can be used on estimating parameters for multi-variate data.
Clustering, Data Science, Data Science Education, Predictive Analytics, Statistics
- Central Limit Theorem for Data Science – Part 2 - Aug 16, 2016.
This post continues an explanation of Central Limit Theorem started in a previous post, with additional details... and beer.
Beer, Centrality, Distribution, Statistics
- Central Limit Theorem for Data Science - Aug 12, 2016.
This post is an introductory explanation of the Central Limit Theorem, and why it is (or should be) of importance to data scientists.
Centrality, Distribution, Statistics
- Understanding the Empirical Law of Large Numbers and the Gambler’s Fallacy - Aug 12, 2016.
Law of large numbers is a important concept for practising data scientists. In this post, The empirical law of large numbers is demonstrated via simple simulation approach using the Bernoulli process.
Algorithms, R, Statistics
- What Statistics Topics are Needed for Excelling at Data Science? - Aug 2, 2016.
Here is a list of skills and statistical concepts suggested for excelling at data science, roughly in order of increasing complexity.
Bayesian, Distribution, Machine Learning, Markov Chains, Probability, Regression, Statistics
- Doing Statistics with SQL - Aug 2, 2016.
This post covers how to perform some basic in-database statistical analysis using SQL.
SQL, Statistics
- Data Science Statistics 101 - Jul 28, 2016.
Statistics can often be the most intimidating aspect of data science for aspiring data scientists to learn. Gain some personal perspective from someone who has traveled the path.
Beginners, Data Science, Statistics
- Why Big Data is in Trouble: They Forgot About Applied Statistics - Jul 18, 2016.
This "classic" (but very topical and certainly relevant) post discusses issues that Big Data can face when it forgets, or ignores, applied statistics. As great of a discussion today as it was 2 years ago.
Applied Statistics, Big Data, Google, Statistics
- Big Data, Bible Codes, and Bonferroni - Jul 8, 2016.
This discussion will focus on 2 particular statistical issues to be on the look out for in your own work and in the work of others mining and learning from Big Data, with real world examples emphasizing the importance of statistical processes in practice.
Bible, Big Data, Bonferroni, Probability, Statistics, Terrorism
- Machine Learning Classic: Parsimonious Binary Classification Trees - Jun 14, 2016.
Get your hands on a classic technical report outlining a three-step method of construction binary decision trees for multiple classification problems.
Decision Trees, Leo Breiman, Machine Learning, Statistics
- Data Science of Variable Selection: A Review - Jun 7, 2016.
There are as many approaches to selecting features as there are statisticians since every statistician and their sibling has a POV or a paper on the subject. This is an overview of some of these approaches.
Algorithms, Big Data, Feature Selection, Statistics
- Eugenics – journey to the dark side at the dawn of statistics - Apr 27, 2016.
Today is the 80th anniversary of the death of Karl Pearson, one of the founding father of statistics (correlation coefficient, principal components, the p-value, and much more). He was also deeply involved with eugenics, a jarring reminder that truth often comes bundled with a measure of darkness.
Correlation, Eugenics, Karl Pearson, Statistics
- The Evolution of the Data Scientist - Mar 16, 2016.
We trace the evolution of Data Science from ancient mathematics to statistics and early neural networks, to present successes like AlphaGo and self-driving car, and look into the future.
Automated, Data Scientist, Demis Hassabis, Evolution, Mathematics, Statistics
- Top 10 TED Talks for the Data Scientists - Feb 9, 2016.
TEDTalks have been a great platform for sharing ideas and inspirations. Here, we have sifted ten interesting talks for the data scientist from statistics, social media and economics domains.
Data Science, Hans Rosling, Social Networks, Statistics, TED
- Data-Planet Statistical Datasets - Nov 4, 2015.
Data-Planet Statistical Datasets provides easy access to an extensive repository of standardized and structured statistical data, with more than 25 billion data points from more than 70 source organizations.
Data Platform, Statistics, Time Series, Time series data
- We need a statistically rigorous and scientifically meaningful definition of replication - Oct 29, 2015.
Replication and confirmation are indispensable concepts that help define scientific facts. It seems that before continuing the debate over replication, we need a statistically meaningful definition of replication.
Replication, Reproducibility, Statistics
- How to become a Data Scientist for Free - Aug 28, 2015.
Here are the most required skills for a data scientist position based on ReSkill’s analyses of thousands of job posts and free resources to learn each skill.
Data Science Education, Data Scientist, Java, Online Education, Python, R, SQL, Statistics
- Understanding Basic Concepts and Dispersion - Aug 10, 2015.
In analytics it is a common practice to understand the basic statistical properties of its variables viz. range, mean and deviation. Centrality measures are the most important to them, explore how to use these measures.
Dispersion, RideOnData, Statistics
- Deep Learning and the Triumph of Empiricism - Jul 7, 2015.
Theoretical guarantees are clearly desirable. And yet many of today's best-performing supervised learning algorithms offer none. What explains the gap between theoretical soundness and empirical success?
Big Data, Data Science, Deep Learning, Mathematics, Statistics, Zachary Lipton
- Applied Statistics Is A Way Of Thinking, Not Just A Toolbox - May 29, 2015.
The choice of tools in applied statistics is driven by the objective, the structure of the data, and the nature of the uncertainty in the numbers, whereas in academic statistics its driven by publishing or teaching. Here we provide some of common statistical tools and the overlapping genealogy.
Applied Statistics, Randy Bartlett, Statistics, Toolbox
- 10 things statistics taught us about big data analysis - Feb 10, 2015.
There are 10 ideas in applied statistics are relevant for big data analysis, focusing on prediction accuracy, interactive analysis and more.
Best Practices, Big Data, Overfitting, Statistics
- Causation vs Correlation: Visualization, Statistics, and Intuition - Jan 4, 2015.
Visualizations of correlation vs. causation and some common pitfalls and insights involving the statistics are explored in this case study involving stock price time series.
Alex Jones, Causation, Correlation, Data Visualization, Statistics
- Hiring Data Scientists: What to look for? - Sep 9, 2014.
Know key characteristics of what makes up a good data scientist based upon the three authors’ consulting and research experience, having collaborated with many companies world-wide on the topics of big data and analytics.
Analytics, Big Data, Business, Data Mining, Data Scientist, Hiring, Programming, Skills, Statistics
- How Xbox, Big Data & Statistical Analysis Can Measure Public Opinion - Jul 11, 2014.
Could the Xbox gaming platform and Big Data hold the key to generating accurate measures of public opinion, such as election polling? A team of statistical scientists think so.
Big Data, Statistics, Survey, Xbox