Top Resources for Learning Statistics for Data Science - Dec 16, 2021.
Let’s take a look at the current state of statistics in data science, and what you can do to accelerate your learning.
Courses, Data Science, Springboard, Statistics
- Feature Selection: Where Science Meets Art - Dec 14, 2021.
From heuristic to algorithmic feature selection techniques for data science projects.
Data Preprocessing, Feature Selection, Machine Learning, Statistics
- How to Use Permutation Tests - Dec 2, 2021.
A walkthrough of permutation tests and how they can be applied to time series data.
Statistics
- Find the Best-Matching Distribution for Your Data Effortlessly - Oct 22, 2021.
How to find the best-matching statistical distributions for your data points — in an automated and easy way. And, then how to extend the utility further.
Distribution, Python, Statistics, Synthetic Data
- How to calculate confidence intervals for performance metrics in Machine Learning using an automatic bootstrap method - Oct 15, 2021.
Are your model performance measurements very precise due to a “large” test set, or very uncertain due to a “small” or imbalanced test set?
Machine Learning, Metrics, Statistics
- How to do “Limitless” Math in Python - Oct 7, 2021.
How to perform arbitrary-precision computation and much more math (and fast too) than what is possible with the built-in math library in Python.
Linear Algebra, Mathematics, Probability, Python, Statistics
- Advanced Statistical Concepts in Data Science - Sep 30, 2021.
The article contains some of the most commonly used advanced statistical concepts along with their Python implementation.
Career Advice, Data Science, Distribution, Probability, Statistics
- Important Statistics Data Scientists Need to Know - Sep 29, 2021.
Several fundamental statistical concepts must be well appreciated by every data scientist -- from the enthusiast to the professional. Here, we provide code snippets in Python to increase understanding to bring you key tools that bring early insight into your data.
Bayes Theorem, Data Science, Probability, Statistics
- Real-Time Histogram Plots on Unbounded Data - Sep 24, 2021.
Using histograms on real-time data is not possible in most of the popular data science libraries. In this article you will learn how dynamically compute and display a histogram within a Python notebook.
Data Visualization, Histogram, Real-time, Statistics
How to Find Weaknesses in your Machine Learning Models - Sep 20, 2021.
FreaAI: a new method from researchers at IBM.
Interpretability, Machine Learning, Modeling, Statistics
- Paradoxes in Data Science - Sep 17, 2021.
Have a look into some of the main paradoxes associate with Data Science and it’s statistical foundations.
Data Science, Statistics
- KDnuggets™ News 21:n34, Sep 8: Do You Read Excel Files with Python? There is a 1000x Faster Way; Hypothesis Testing Explained - Sep 8, 2021.
Do You Read Excel Files with Python? There is a 1000x Faster Way; Hypothesis Testing Explained; Data Science Cheat Sheet 2.0; 6 Cool Python Libraries That I Came Across Recently; Best Resources to Learn Natural Language Processing in 2021
AI, Cheat Sheet, Data Science, Excel, Hypothesis Testing, Machine Learning, Python, Statistics
- Antifragility and Machine Learning - Sep 6, 2021.
Our intuition for most products, processes, and even some models might be that they either will get worse over time, or if they fail, they will experience an cascade of more failure. But, what if we could intentionally design systems and models to only get better, even as the world around them gets worse?
Machine Learning, Mathematics, Statistics
- What is Noise? - Aug 25, 2021.
We might have a reasonable sense for what "noise" is as some statically random phenomena that occurs in Nature. But, how can this same characteristic be defined--and understood--within the context of making judgements, such as in human behavior, corporate decision-making, medicine, the law, and AI systems?
Bias, Book, Daniel Kahneman, Statistics, Variance, Vasant Dhar
Learning Data Science and Machine Learning: First Steps After The Roadmap - Aug 24, 2021.
Just getting into learning data science may seem as daunting as (if not more than) trying to land your first job in the field. With so many options and resources online and in traditional academia to consider, these pre-requisites and pre-work are recommended before diving deep into data science and AI/ML.
Data Science, Machine Learning, Mathematics, Python, Roadmap, Statistics
- Introduction to Statistical Learning Second Edition - Aug 13, 2021.
The second edition of the classic "An Introduction to Statistical Learning, with Applications in R" was published very recently, and is now freely-available via PDF on the book's website.
Books, Data Science, Machine Learning, R, Statistical Learning, Statistics
- Be Wary of Automated Feature Selection — Chi Square Test of Independence Example - Aug 5, 2021.
When Data Scientists use chi square test for feature selection, they just merely go by the ritualistic “If your p-value is low, the null hypothesis must go”. The automated function they use behaves no differently.
Automated Data Science, Automated Machine Learning, Feature Selection, Statistics
A Brief Introduction to the Concept of Data - Jul 29, 2021.
Every aspiring data scientist must know the concept of data and the kind of analysis they can run. This article introduces the concept of data (quantitative and qualitative) and the types of analysis.
Beginners, Data Analytics, Data Science, Qualitative Analytics, Quantitative Analytics, Statistics
- The Lost Art of Decile Analysis - Jul 22, 2021.
The goal of classification is a primary and widely-used application of machine learning algorithms. However, if careful consideration through additional analysis is not taken into the subtlety in the results of an even an apparently straightforward binary classifier, then the deeper meaning of your prediction may be obscured.
Lift charts, Predictive Models, Statistics
- WHT: A Simpler Version of the fast Fourier Transform (FFT) you should know - Jul 21, 2021.
The fast Walsh Hadamard transform is a simple and useful algorithm for machine learning that was popular in the 1960s and early 1970s. This useful approach should be more widely appreciated and applied for its efficiency.
Algorithms, Statistics, Time Series
11 Important Probability Distributions Explained - Jul 20, 2021.
There are many distribution functions considered in statistics and machine learning, which can seem daunting to understand at first. Many are actually closely related, and with these intuitive explanations of the most important probability distributions, you can begin to appreciate the observations of data these distributions communicate.
Explained, Probability, Statistics
- Why Saying “We Accept the Null Hypothesis” is Wrong: An Intuitive Explanation - Jul 19, 2021.
“The opposite of ‘Rejecting the Null’ is ‘Accepting’ isn’t it?”. Well, it is not so simple as it is construed. We need to rise above antonyms and understand one crucial concept.
Data Science, Statistics
- This Data Visualization is the First Step for Effective Feature Selection - Jun 8, 2021.
Understanding the most important features to use is crucial for developing a model that performs well. Knowing which features to consider requires experimentation, and proper visualization of your data can help clarify your initial selections. The scatter pairplot is a great place to start.
Data Visualization, Feature Selection, Statistics, Stocks
- Confidence Intervals for XGBoost - May 11, 2021.
Read this article about building a regularized Quantile Regression objective.
Prediction, Statistics, XGBoost
- KDnuggets™ News 21:n16, Apr 28: Data Science Books You Should Start Reading in 2021; Top 10 Must-Know Machine Learning Algorithms for Data Scientists - Apr 28, 2021.
Data science is not about data – applying Dijkstra principle to data science; Data Science Books You Should Start Reading in 2021; How to ace A/B Testing Data Science Interviews; Top 10 Must-Know Machine Learning Algorithms for Data Scientists – Part 1; Production-Ready Machine Learning NLP API with FastAPI and spaCy
A/B Testing, Algorithms, API, Books, Data Science, Interview, NLP, Statistics
- 10 Must-Know Statistical Concepts for Data Scientists - Apr 21, 2021.
Statistics is a building block of data science. If you are working or plan to work in this field, then you will encounter the fundamental concepts reviewed for you here. Certainly, there is much more to learn in statistics, but once you understand these basics, then you can steadily build your way up to advanced topics.
Bayes Theorem, Correlation, Normal Distribution, P-value, Sampling, Statistics, Variance
- Data Science 101: Normalization, Standardization, and Regularization - Apr 20, 2021.
Normalization, standardization, and regularization all sound similar. However, each plays a unique role in your data preparation and model building process, so you must know when and how to use these important procedures.
Data Preprocessing, Feature Engineering, Normalization, Regression, Regularization, Statistics
Top 3 Statistical Paradoxes in Data Science - Apr 15, 2021.
Observation bias and sub-group differences generate statistical paradoxes.
Bias, Data Science, Simpson's Paradox, Statistics
- Data Science Curriculum for Professionals - Mar 25, 2021.
If you are looking to expand or transition your current professional career that is buried in spreadsheet analysis into one powered by data science, then you are in for an exciting but complex journey with much to explore and master. To begin your adventure, following this complete road map to guide you from a gnome in the forest of spreadsheets to an AI wizard known far and wide throughout the kingdom.
Cloud Computing, Data Science Education, Data Visualization, Machine Learning, Python, R, Roadmap, Statistics
- Rejection Sampling with Python - Mar 24, 2021.
Read this article on rejection sampling with examples using the Normal and Cauchy Distributions.
Distribution, Probability, Python, Sampling, Statistics
- KDnuggets™ News 21:n11, Mar 17: Is Data Scientist still a satisfying job? How To Overcome The Fear of Math and Learn Math For Data Science - Mar 17, 2021.
Must Know for Data Scientists and Data Analysts: Causal Design Patterns; Know your data much faster with the new Sweetviz Python library; The Inferential Statistics Data Scientists Should Know; Natural Language Processing Pipelines, Explained
Career, Data Science, Data Scientist, Data Visualization, Mathematics, Python, Statistics, Survey
Must Know for Data Scientists and Data Analysts: Causal Design Patterns - Mar 12, 2021.
Industry is a prime setting for observational causal inference, but many companies are blind to causal measurement beyond A/B tests. This formula-free primer illustrates analysis design patterns for measuring causal effects from observational data.
Causality, Data Science, Design, Design of Experiments, Statistics
- The Inferential Statistics Data Scientists Should Know - Mar 11, 2021.
The foundations of Data Science and machine learning algorithms are in mathematics and statistics. To be the best Data Scientists you can be, your skills in statistical understanding should be well-established. The more you appreciate statistics, the better you will understand how machine learning performs its apparent magic.
Data Science Education, Statistics
- 10 Statistical Concepts You Should Know For Data Science Interviews - Feb 23, 2021.
Data Science is founded on time-honored concepts from statistics and probability theory. Having a strong understanding of the ten ideas and techniques highlighted here is key to your career in the field, and also a favorite topic for concept checks during interviews.
Bayes Theorem, Interview Questions, Linear Regression, Logistic Regression, P-value, Sampling, Statistics
Want to Be a Data Scientist? Don’t Start With Machine Learning - Jan 26, 2021.
Machine learning may appear like the go-to topic to start learning for the aspiring data scientist. But. thinking these techniques are the key aspects of the role is the biggest misconception. So much more goes into becoming a successful data scientist, and machine learning is only one component of broader skills around processing, managing, and understanding the science behind the data.
Career Advice, Data Scientist, Machine Learning, Statistics
- Null Hypothesis Significance Testing is Still Useful - Jan 25, 2021.
Even in the aftermath of the replication crisis, statistical significance lingers as an important concept for Data Scientists to understand.
Hypothesis Testing, P-value, Statistical Significance, Statistics
- Comprehensive Guide to the Normal Distribution - Jan 18, 2021.
Drop in for some tips on how this fundamental statistics concept can improve your data science.
Distribution, Normal Distribution, Python, SciPy, Statistics
15 Free Data Science, Machine Learning & Statistics eBooks for 2021 - Dec 31, 2020.
We present a curated list of 15 free eBooks compiled in a single location to close out the year.
Automated Machine Learning, Data Science, Deep Learning, Free ebook, Machine Learning, NLP, Python, R, Statistics
Monte Carlo integration in Python - Dec 24, 2020.
A famous Casino-inspired trick for data science, statistics, and all of science. How to do it in Python?
Monte Carlo, Python, Simulation, Statistics
- 5 Free Books to Learn Statistics for Data Science - Dec 8, 2020.
Learn all the statistics you need for data science for free.
Data Science, Free ebook, Statistics
- Essential Math for Data Science: Probability Density and Probability Mass Functions - Dec 7, 2020.
In this article, we’ll cover probability mass and probability density function in this sample. You’ll see how to understand and represent these distribution functions and their link with histograms.
Data Science, Mathematics, Probability, Statistics
- 10 Principles of Practical Statistical Reasoning - Nov 3, 2020.
Practical Statistical Reasoning is a term that covers the nature and objective of applied statistics/data science, principles common to all applications, and practical steps/questions for better conclusions. The following principles have helped me become more efficient with my analyses and clearer in my conclusions.
Data Analysis, Data Quality, Data Science, Statistical Analysis, Statistics
The Best Free Data Science eBooks: 2020 Update - Sep 30, 2020.
The author has updated their list of best free data science books for 2020. Read on to see what books you should grab.
Books, Data Science, Free ebook, Probability, Programming, Statistics
- Causal Inference: The Free eBook - Sep 25, 2020.
Here's another free eBook for those looking to up their skills. If you are seeking a resource that exhaustively treats the topic of causal inference, this book has you covered.
Books, Free ebook, Inference, Statistics
- What is Simpson’s Paradox and How to Automatically Detect it - Sep 18, 2020.
Looking at data one way can tell one story, but sometimes looking at it another way will tell the opposite story. Understanding this paradox and why it happens is essential, and new tools are available to help automatically detect this tricky issue in your datasets.
Simpson's Paradox, Statistics
Statistics with Julia: The Free eBook - Sep 14, 2020.
This free eBook is a draft copy of the upcoming Statistics with Julia: Fundamentals for Data Science, Machine Learning and Artificial Intelligence. Interested in learning Julia for data science? This might be the best intro out there.
Books, Data Science, Free ebook, Julia, Statistics
Modern Data Science Skills: 8 Categories, Core Skills, and Hot Skills - Sep 8, 2020.
We analyze the results of the Data Science Skills poll, including 8 categories of skills, 13 core skills that over 50% of respondents have, the emerging/hot skills that data scientists want to learn, and what is the top skill that Data Scientists want to learn.
Communication, Data Preparation, Data Science Skills, Data Visualization, Excel, GitHub, Mathematics, Poll, Python, Reinforcement Learning, scikit-learn, SQL, Statistics
- Book Chapter: The Art of Statistics: Learning from Data - Sep 3, 2020.
Get a free book chapter from "The Art of Statistics: Learning from Data" by a leading researcher Sir David John Spiegelhalter. This excerpt takes a forensic look at data surrounding the victims of the UK most prolific serial killer and shows how a simple search for patterns reveals critical details.
Book, Crime, JMP, Statistics
- Which methods should be used for solving linear regression? - Sep 2, 2020.
As a foundational set of algorithms in any machine learning toolbox, linear regression can be solved with a variety of approaches. Here, we discuss. with with code examples, four methods and demonstrate how they should be used.
Gradient Descent, Linear Regression, numpy, Python, Statistics, SVD
These Data Science Skills will be your Superpower - Aug 20, 2020.
Learning data science means learning the hard skills of statistics, programming, and machine learning. To complete your training, a broader set of soft skills will round out your capabilities as an effective and successful professional Data Scientist.
Communication, Data Preparation, Data Science Skills, Data Visualization, Mathematics, Statistics
- KDnuggets™ News 20:n32, Aug 19: The List of Top 10 Data Science Lists; Data Science MOOCs with Substance - Aug 19, 2020.
The List of Top 10 Lists in Data Science; Going Beyond Superficial: Data Science MOOCs with Substance; Introduction to Statistics for Data Science; Content-Based Recommendation System using Word Embeddings; How Natural Language Processing Is Changing Data Analytics
Courses, Data Analytics, Data Science, Data Science Skills, MOOC, NLP, Recommendation Engine, Recommender Systems, Statistics, Word Embeddings
- Hypothesis Test for Real Problems - Aug 14, 2020.
Hypothesis tests are significant for evaluating answers to questions concerning samples of data.
Hypothesis Testing, P-value, Statistics
- Introduction to Statistics for Data Science - Aug 12, 2020.
Statistics is foundational for Data Science and a crucial skill to master for any practitioner. This advanced introduction reviews with examples the fundamental concepts of inferential statistics by illustrating the differences between Point Estimators and Confidence Intervals Estimates.
Beginners, Data Science, Statistics
- R squared Does Not Measure Predictive Capacity or Statistical Adequacy - Jul 31, 2020.
The fact that R-squared shouldn't be used for deciding if you have an adequate model is counter-intuitive and is rarely explained clearly. This demonstration overviews how R-squared goodness-of-fit works in regression analysis and correlations, while showing why it is not a measure of statistical adequacy, so should not suggest anything about future predictive performance.
Predictive Analytics, Regression, Statistics
- A Complete Guide To Survival Analysis In Python, part 3 - Jul 30, 2020.
Concluding this three-part series covering a step-by-step review of statistical survival analysis, we look at a detailed example implementing the Kaplan-Meier fitter based on different groups, a Log-Rank test, and Cox Regression, all with examples and shared code.
Jupyter, Python, Regression, Statistics, Survival Analysis
Essential Resources to Learn Bayesian Statistics - Jul 28, 2020.
If you are interesting in becoming better at statistics and machine learning, then some time should be invested in diving deeper into Bayesian Statistics. While the topic is more advanced, applying these fundamentals to your work will advance your understanding and success as an ML expert.
Bayesian, Machine Learning, Markov Chain, Statistics
- Demystifying Statistical Significance - Jul 17, 2020.
With more professionals from a wide range of less technical fields diving into statistical analysis and data modeling, these experimental techniques can seem daunting. To help with these hurdles, this article clarifies some misconceptions around p-values, hypothesis testing, and statistical significance.
P-value, Statistical Significance, Statistics
- Before Probability Distributions - Jul 16, 2020.
Why do we use probability distributions, and why do they matter?
Distribution, Probability, Statistics
- A Complete Guide To Survival Analysis In Python, part 2 - Jul 14, 2020.
Continuing with the second of this three-part series covering a step-by-step review of statistical survival analysis, we look at a detailed example implementing the Kaplan-Meier fitter theory as well as the Nelson-Aalen fitter theory, both with examples and shared code.
Python, Statistics, Survival Analysis
A Complete Guide To Survival Analysis In Python, part 1 - Jul 7, 2020.
This three-part series covers a review with step-by-step explanations and code for how to perform statistical survival analysis used to investigate the time some event takes to occur, such as patient survival during the COVID-19 pandemic, the time to failure of engineering products, or even the time to closing a sale after an initial customer contact.
Python, Statistics, Survival Analysis
4 Free Math Courses to do and Level up your Data Science Skills - Jun 22, 2020.
Just as there is no Data Science without data, there's no science in data without mathematics. Strengthening your foundational skills in math will level you up as a data scientist that will enable you to perform with greater expertise.
Bayesian, Coursera, edX, Inference, Linear Algebra, Mathematics, Online Education, Principal component analysis, Probability, Python, Statistics
Overview of data distributions - Jun 10, 2020.
With so many types of data distributions to consider in data science, how do you choose the right one to model your data? This guide will overview the most important distributions you should be familiar with in your work.
Binomial, Distribution, Normal Distribution, Poisson Distribution, Probability, Statistics
- KDnuggets™ News 20:n23, Jun 10: Largest Dataset you analyzed? If you start statistics all over again, where would you start? GPT-3 - Jun 10, 2020.
#BlackLivesMatter. In this issue: If you had to start statistics all over again, where would you start? New Poll: What was the largest dataset you analyzed? Another Great NLP Course from Stanford; Naive Bayes: Everything you need to know; GPT-3 - a giant leap for Deep Learning and NLP?
Naive Bayes, NLP, Stanford, Statistics
If you had to start statistics all over again, where would you start? - Jun 5, 2020.
If you are just diving into learning statistics, then where do you begin? Find insight from those who have tread in these waters before, and see what they might have done differently along their personal journeys in statistics.
Advanced Statistics, Advice, Bayesian, Career Advice, Statistician, Statistics
- STIPS – Statistical Thinking for Industrial Problem Solving – A free online statistics course - Jun 2, 2020.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
Course, JMP, Online Education, Statistics
- Appropriately Handling Missing Values for Statistical Modelling and Prediction - May 22, 2020.
Many statisticians in industry agree that blindly imputing the missing values in your dataset is a dangerous move and should be avoided without first understanding why the data is missing in the first place.
Advice, Analytics, Business Analytics, Data Preparation, Data Science, Data Scientist, Missing Values, Statistics
- Looking Normal(ly Distributed) - May 20, 2020.
This article investigates when some probability distributions look normal "enough" for a statistical test.
Data Visualization, Distribution, Normal Distribution, Probability, Statistics
- Evidence Counterfactuals for explaining predictive models on Big Data - May 18, 2020.
Big Data generated by people -- such as, social media posts, mobile phone GPS locations, and browsing history -- provide enormous prediction value for AI systems. However, explaining how these models predict with the data remains challenging. This interesting explanation approach considers how a model would behave if it didn't have the original set of data to work with.
Big Data, Explainability, Predictive Modeling, Predictive Models, Statistics
- Were 21% of New York City residents really infected with the novel coronavirus? - May 6, 2020.
Understanding the types of statistical bias that pop up in popular media and reporting is especially important during this pandemic where the data -- and our global response to the data -- directly impact peoples' lives.
Bias, Coronavirus, Mistakes, New York City, Statistics
- Statistical Thinking for Industrial Problem Solving – a free online statistics course - May 5, 2020.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
Course, JMP, Online Education, Statistics
A Concise Course in Statistical Inference: The Free eBook - Apr 27, 2020.
Check out this freely available book, All of Statistics: A Concise Course in Statistical Inference, and learn the probability and statistics needed for success in data science.
Book, Free ebook, Mathematics, Statistics
Should Data Scientists Model COVID19 and other Biological Events - Apr 22, 2020.
Biostatisticians use statistical techniques that your current everyday data scientists have probably never heard of. This is a great example where lack of domain knowledge exposes you as someone that does not know what they are doing and are merely hopping on a trend.
Advice, COVID-19, Data Science, Data Scientist, Statistics
- Statistical Thinking for Industrial Problem Solving – a free online statistics course - Apr 9, 2020.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
Course, JMP, Online Education, Statistics
- Free online statistics course – Improve your analytics knowledge - Mar 26, 2020.
This online course is available – for free – to anyone interested in using data to solve problems better.
Course, JMP, Online Education, Statistics
- Data Science Curriculum for self-study - Feb 26, 2020.
Are you asking the question, "how do I become a Data Scientist?" This list recommends the best essential topics to gain an introductory understanding for getting started in the field. After learning these basics, keep in mind that doing real data science projects through internships or competitions is crucial to acquiring the core skills necessary for the job.
Advice, Data Science, Data Science Education, Data Visualization, Mathematics, Probability, Programming, Statistics
- Statistical Thinking for Industrial Problem Solving: a free online course. - Jan 13, 2020.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
Course, JMP, Online Education, Statistics
- Statistical Thinking for Industrial Problem Solving: a free online course - Dec 3, 2019.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
Course, JMP, Online Education, Statistics
- An Eight-Step Checklist for An Analytics Project - Nov 6, 2019.
Follow these eight headings of an audit sheet that business analysts should address before submitting the results of their analytics project. One recommended approach is to rewrite each step as a question, answer it, and then attach it to your project.
Analytics, Checklist, Deployment, Feature Selection, Statistics
- KDnuggets™ News 19:n42, Nov 6: 5 Statistical Traps Data Scientists Should Avoid; 10 Free Must-Read Books on AI - Nov 6, 2019.
Learn about statistical fallacies Data Scientists should avoid; New and quite amazing Deep Learning capabilities FB has been quietly open-sourcing; Top Machine Learning tools for Developers; How to build a Neural Network from scratch and more.
AI, Free ebook, Mistakes, Statistics
- Probability Learning: Maximum Likelihood - Nov 5, 2019.
The maths behind Bayes will be better understood if we first cover the theory and maths underlying another fundamental method of probabilistic machine learning: Maximum Likelihood. This post will be dedicated to explaining it.
Learning, Probability, Statistics
5 Statistical Traps Data Scientists Should Avoid - Oct 30, 2019.
Here are five statistical fallacies — data traps — which data scientists should be aware of and definitely avoid.
Bias, Fallacies, Simpson's Paradox, Statistics

How to Become a (Good) Data Scientist – Beginner Guide - Oct 16, 2019.
A guide covering the things you should learn to become a data scientist, including the basics of business intelligence, statistics, programming, and machine learning.
Beginners, BI, Data Scientist, Sciforce, Statistics
- An Overview of Density Estimation - Oct 14, 2019.
Density estimation is estimating the probability density function of the population from the sample. This post examines and compares a number of approaches to density estimation.
Generative Adversarial Network, Probability, Statistics
- Statistical Thinking for Industrial Problem Solving: a free online course - Oct 2, 2019.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
JMP, Online Education, Statistics
6 bits of advice for Data Scientists - Sep 25, 2019.
As a data scientist, you can get lost in your daily dives into the data. Consider this advice to be certain to follow in your work for being diligent and more impactful for your organization.
Advice, Data Cleaning, Data Scientist, Metrics, Overfitting, Statistics
- Beta Distribution: What, When & How - Sep 25, 2019.
This article covers the beta distribution, and explains it using baseball batting averages.
Distribution, Probability, Statistics
Which Data Science Skills are core and which are hot/emerging ones? - Sep 17, 2019.
We identify two main groups of Data Science skills: A: 13 core, stable skills that most respondents have and B: a group of hot, emerging skills that most do not have (yet) but want to add. See our detailed analysis.
Career, Data Science Skills, Data Visualization, Deep Learning, Excel, Machine Learning, Poll, Python, PyTorch, Scala, Skills, Statistics, TensorFlow
- How Bad is Multicollinearity? - Sep 17, 2019.
For some people anything below 60% is acceptable and for certain others, even a correlation of 30% to 40% is considered too high because it one variable may just end up exaggerating the performance of the model or completely messing up parameter estimates.
Analytics, Multicollinearity, Regression, Statistics
- What’s the difference between analytics and statistics? - Sep 6, 2019.
From asking the best questions about data to answering those questions with certainty, understanding the value of these two seemingly different professions is clarified when you see how they should work together.
Analytics, Explained, Statistics
Statistical Modelling vs Machine Learning - Aug 14, 2019.
At times it may seem Machine Learning can be done these days without a sound statistical background but those people are not really understanding the different nuances. Code written to make it easier does not negate the need for an in-depth understanding of the problem.
Advice, Data Science, Machine Learning, Statistics
- What is Poisson Distribution? - Aug 14, 2019.
An solid overview of the Poisson distribution, starting from why it is needed, how it stacks up to binomial distribution, deriving its formula mathematically, and more.
Distribution, Poisson Distribution, Probability, Statistics
- Statistical Thinking for Industrial Problem Solving (STIPS) – a free online course. - Aug 2, 2019.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
JMP, Online Education, Statistics
- P-values Explained By Data Scientist - Jul 30, 2019.
This article is designed to give you a full picture from constructing a hypothesis testing to understanding p-value and using that to guide our decision making process.
Data Science, Data Scientist, Hypothesis Testing, P-value, Statistics
- Annotated Heatmaps of a Correlation Matrix in 5 Simple Steps - Jul 9, 2019.
A heatmap is a graphical representation of data in which data values are represented as colors. That is, it uses color in order to communicate a value to the reader. This is a great tool to assist the audience towards the areas that matter the most when you have a large volume of data.
Data Visualization, Python, Statistics
- How do you check the quality of your regression model in Python? - Jul 2, 2019.
Linear regression is rooted strongly in the field of statistical learning and therefore the model must be checked for the ‘goodness of fit’. This article shows you the essential steps of this task in a Python ecosystem.
Data Science, Multicollinearity, Python, Regression, Statistics
- Top KDnuggets Tweets, Jun 12 – 18: The biggest mistake while learning #Python for #datascience; 5 practical statistical concepts for data scientists - Jun 19, 2019.
Also: Resources for developers transitioning into data science; Best Data Visualization Techniques for small and large data; Top Data Science and Machine Learning Methods Used in 2018, 2019
Advice, Python, Statistics, Top tweets
- KDnuggets™ News 19:n23, Jun 19: Useful Stats for Data Scientists; Python, TensorFlow & R Winners in Latest Job Report - Jun 19, 2019.
This week on KDnuggets: 5 Useful Statistics Data Scientists Need to Know; Data Science Jobs Report 2019: Python Way Up, TensorFlow Growing Rapidly, R Use Double SAS; How to Learn Python for Data Science the Right Way; The Machine Learning Puzzle, Explained; Scalable Python Code with Pandas UDFs; and much more!
Data Science, Data Scientist, Machine Learning, Pandas, Python, R, Report, SAS, Scalability, Statistics, TensorFlow
5 Useful Statistics Data Scientists Need to Know - Jun 14, 2019.
A data scientist should know how to effectively use statistics to gain insights from data. Here are five useful and practical statistical concepts that every data scientist must know.
Data Science, Data Scientist, Statistics
- All Models Are Wrong – What Does It Mean? - Jun 12, 2019.
During your adventures in data science, you may have heard “all models are wrong.” Let’s unpack this famous quote to understand how we can still make models that are useful.
Advice, Linear Regression, Modeling, Statistics
Top 10 Statistics Mistakes Made by Data Scientists - Jun 7, 2019.
The following are some of the most common statistics mistakes made by data scientists. Check this list often to make sure you are not making any of these while applying statistics to data science.
Data Science, Data Scientist, GitHub, Mistakes, Statistics
- Statistical Thinking for Industrial Problem Solving (STIPS): a free online course. - Jun 4, 2019.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
JMP, Online Education, Statistics
- Separating signal from noise - Jun 4, 2019.
When we are building a model, we are making the assumption that our data has two parts, signal and noise. Signal is the real pattern, the repeatable process that we hope to capture and describe. The noise is everything else that gets in the way of that.
Noise, Regression, Statistics, Time Series
- What Does a Lady Tasting Tea Have to Do with Science? - May 31, 2019.
Design of Experiments (DOE) is a statistical concept used to find the cause-and-effect relationships. Surprisingly, an experiment arising from a casual conversation about tea-drinking is one of the first examples of an experiment designed using statistical ideas.
Design of Experiments, Randomization, Statistics
- Probability Mass and Density Functions - May 21, 2019.
This content is part of a series about the chapter 3 on probability from the Deep Learning Book by Goodfellow, I., Bengio, Y., and Courville, A. (2016). It aims to provide intuitions/drawings/python code on mathematical theories and is constructed as my understanding of these concepts.
Pages: 1 2
Mathematics, Probability, Statistics
- Modeling 101 - May 13, 2019.
In the past couple of decades, innovation in statistics and machine learning has been increasing at a rapid pace and we are now able to do things unimaginable when I began my career.
Data Science, Modeling, Statistics
- Naive Bayes: A Baseline Model for Machine Learning Classification Performance - May 7, 2019.
We can use Pandas to conduct Bayes Theorem and Scikitlearn to implement the Naive Bayes Algorithm. We take a step by step approach to understand Bayes and implementing the different options in Scikitlearn.
Pages: 1 2
Algorithms, Data Science, Machine Learning, Naive Bayes, Python, scikit-learn, Statistics
- Statistical Thinking for Industrial Problem Solving (STIPS) – a free online course. - May 3, 2019.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
JMP, Online Education, Statistics
- Statistical Thinking for Industrial Problem Solving (STIPS) – a free online course - Apr 5, 2019.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
JMP, Online Education, Statistics
- Spatio-Temporal Statistics: A Primer - Apr 5, 2019.
Marketing scientist Kevin Gray asks University of Missouri Professor Chris Wikle about Spatio-Temporal Statistics and how it can be used in science and business.
Interview, Spatio-Temporal, Statistics
- Wake Forest University: Teaching Professor/Professor of the Practice in Statistics/Analytics [Winston-Salem, NC] - Mar 18, 2019.
The Wake Forest University School of Business is seeking qualified candidates for a Teaching Professor/Professor of the Practice in Statistics/Analytics. This individual will be expected to teach graduate courses in areas such as Data Analysis & Business Modeling, Data Mining & Machine Learning, and Forecasting.
Analytics, NC, Professor, Statistics, Wake Forest University, Winston-Salem
- The 7 Myths of Data Anonymisation - Mar 12, 2019.
Anonymisation has always been rather seen as a necessary evil instead of a helpful tool. That’s why plenty of myths have arisen around that technology over the years.
Anonymity, Customer Analytics, Differential Privacy, GDPR, Privacy, Statistics
- Beating the Bookies with Machine Learning - Mar 8, 2019.
We investigate how to use a custom loss function to identify fair odds, including a detailed example using machine learning to bet on the results of a darts match and how this can assist you in beating the bookmaker.
Machine Learning, PyTorch, Sports, Statistics
- Statistical Thinking for Industrial Problem Solving – a free online course - Feb 6, 2019.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
JMP, Online Education, Statistics
- From Good to Great Data Science, Part 1: Correlations and Confidence - Feb 5, 2019.
With the aid of some hospital data, part one describes how just a little inexperience in statistics could result in two common mistakes.
Correlation, Data Science, Python, Statistics
The Essential Data Science Venn Diagram - Feb 4, 2019.
A deeper examination of the interdisciplinary interplay involved in data science, focusing on automation, validity and intuition.
Analytics, Data Science, Machine Learning, Statistics, Venn Diagram
- Southern Illinois University Edwardsville: Director of the Center for Predictive Analytics/(Associate) Professor of Mathematics and Statistics [Edwardsville, IL] - Jan 4, 2019.
Southern Illinois University Edwardsville (SIUE) is establishing the Center for Predictive Analytics (C-PAN), and is seeking an innovative, visionary director for the center who will provide centralized leadership in establishing research and educational initiatives across academic units at SIUE.
Director, Edwardsville, Faculty, IL, Mathematics, Professor, Southern Illinois University Edwardsville, Statistics
Introduction to Statistics for Data Science - Dec 17, 2018.
This tutorial helps explain the central limit theorem, covering populations and samples, sampling distribution, intuition, and contains a useful video so you can continue your learning.
Data Science, Statistics
- A comprehensive list of Machine Learning Resources: Open Courses, Textbooks, Tutorials, Cheat Sheets and more - Dec 7, 2018.
A thorough collection of useful resources covering statistics, classic machine learning, deep learning, probability, reinforcement learning, and more.
Cheat Sheet, Data Science Education, Deep Learning, Machine Learning, Mathematics, Open Source, Reinforcement Learning, Resources, Statistics
The 5 Basic Statistics Concepts Data Scientists Need to Know - Nov 13, 2018.
Today, we’re going to look at 5 basic statistics concepts that data scientists need to know and how they can be applied most effectively!
Data Science, Data Scientist, Statistics
- Quantum Machine Learning: A look at myths, realities, and future projections - Nov 5, 2018.
An overview of quantum computing and quantum algorithm design, including current state of the hardware and algorithm design within the existing systems.
Machine Learning, Python, Quantum Computing, Statistics
- How I Learned to Stop Worrying and Love Uncertainty - Oct 24, 2018.
This is a written version of Data Scientist Adolfo Martínez’s talk at Software Guru’s DataDay 2017. There is a link to the original slides (in Spanish) at the top of this post.
Pages: 1 2
Bayesian, Statistics, Uncertainty
- University of San Francisco: Assistant Professor, Tenure Track, Mathematics and Statistics [San Francisco, CA] - Oct 17, 2018.
The University of San Francisco invites applications for a tenure-track Assistant Professor position to begin August 2019. We seek well-qualified candidates in the areas of applied mathematics or statistics, with a focus on the extraction of knowledge from data.
CA, Mathematics, Professor, San Francisco, Statistics, University of San Francisco
- Mindstrong Health: Sr Data Scientist / Machine Learning, Statistics, Coding [Palo Alto, CA] - Oct 17, 2018.
Mindstrong Health is seeking a Sr Data Scientist in Palo Alto, CA, who is passionate about our mission, committed to excellence and excited to build a company that will address one of the greatest health challenges of our time.
CA, Data Scientist, Machine Learning, Mindstrong Health, Palo Alto, Statistics
- Unfolding Naive Bayes From Scratch - Sep 25, 2018.
Whether you are a beginner in Machine Learning or you have been trying hard to understand the Super Natural Machine Learning Algorithms and you still feel that the dots do not connect somehow, this post is definitely for you!
Pages: 1 2
Bayesian, Classification, Naive Bayes, Probability, Statistics

Machine Learning Cheat Sheets - Sep 11, 2018.
Check out this collection of machine learning concept cheat sheets based on Stanord CS 229 material, including supervised and unsupervised learning, neural networks, tips & tricks, probability & stats, and algebra & calculus.
Cheat Sheet, Deep Learning, Machine Learning, Mathematics, Neural Networks, Probability, Statistics, Supervised Learning, Tips, Unsupervised Learning
- 5 Things to Know About A/B Testing - Sep 7, 2018.
This article presents 5 things to know about A/B testing, from appropriate sample sizes, to statistical confidence, to A/B testing usefulness, and more.
A/B Testing, Applied Statistics, Psychology, Statistics

Essential Math for Data Science: ‘Why’ and ‘How’ - Sep 6, 2018.
It always pays to know the machinery under the hood (even at a high level) than being just the guy behind the wheel with no knowledge about the car.
Data Science, Mathematics, MOOC, Optimization, Statistics
- What on earth is data science? - Sep 4, 2018.
An overview and discussion around data science, covering the history behind the term, data mining, statistical inference, machine learning, data engineering and more.
Data Mining, Data Science, Decision Making, Statistics
- Basic Statistics in Python: Probability - Aug 21, 2018.
At the most basic level, probability seeks to answer the question, "What is the chance of an event happening?" To calculate the chance of an event happening, we also need to consider all the other events that can occur.
Normal Distribution, Probability, Python, Statistics
- Interpreting a data set, beginning to end - Aug 20, 2018.
Detailed knowledge of your data is key to understanding it! We review several important methods that to understand the data, including summary statistics with visualization, embedding methods like PCA and t-SNE, and Topological Data Analysis.
Analytics, Big Data, Data Science, Data Visualization, Machine Learning, SAS, Statistics, t-SNE