15 Free Data Science, Machine Learning & Statistics eBooks for 2021 - Dec 31, 2020.
We present a curated list of 15 free eBooks compiled in a single location to close out the year.
Tags: Automated Machine Learning, Data Science, Deep Learning, Free ebook, Machine Learning, NLP, Python, R, Statistics
Monte Carlo integration in Python - Dec 24, 2020.
A famous Casino-inspired trick for data science, statistics, and all of science. How to do it in Python?
Tags: Monte Carlo, Python, Simulation, Statistics
- 5 Free Books to Learn Statistics for Data Science - Dec 8, 2020.
Learn all the statistics you need for data science for free.
Tags: Data Science, Free ebook, Statistics
- Essential Math for Data Science: Probability Density and Probability Mass Functions - Dec 7, 2020.
In this article, we’ll cover probability mass and probability density function in this sample. You’ll see how to understand and represent these distribution functions and their link with histograms.
Tags: Data Science, Mathematics, Probability, Statistics
- 10 Principles of Practical Statistical Reasoning - Nov 3, 2020.
Practical Statistical Reasoning is a term that covers the nature and objective of applied statistics/data science, principles common to all applications, and practical steps/questions for better conclusions. The following principles have helped me become more efficient with my analyses and clearer in my conclusions.
Tags: Data Analysis, Data Quality, Data Science, Statistical Analysis, Statistics
The Best Free Data Science eBooks: 2020 Update - Sep 30, 2020.
The author has updated their list of best free data science books for 2020. Read on to see what books you should grab.
Tags: Books, Data Science, Free ebook, Probability, Programming, Statistics
- Causal Inference: The Free eBook - Sep 25, 2020.
Here's another free eBook for those looking to up their skills. If you are seeking a resource that exhaustively treats the topic of causal inference, this book has you covered.
Tags: Books, Free ebook, Inference, Statistics
- What is Simpson’s Paradox and How to Automatically Detect it - Sep 18, 2020.
Looking at data one way can tell one story, but sometimes looking at it another way will tell the opposite story. Understanding this paradox and why it happens is essential, and new tools are available to help automatically detect this tricky issue in your datasets.
Tags: Simpson's Paradox, Statistics
Statistics with Julia: The Free eBook - Sep 14, 2020.
This free eBook is a draft copy of the upcoming Statistics with Julia: Fundamentals for Data Science, Machine Learning and Artificial Intelligence. Interested in learning Julia for data science? This might be the best intro out there.
Tags: Books, Data Science, Free ebook, Julia, Statistics
Modern Data Science Skills: 8 Categories, Core Skills, and Hot Skills - Sep 8, 2020.
We analyze the results of the Data Science Skills poll, including 8 categories of skills, 13 core skills that over 50% of respondents have, the emerging/hot skills that data scientists want to learn, and what is the top skill that Data Scientists want to learn.
Tags: Communication, Data Preparation, Data Science Skills, Data Visualization, Excel, GitHub, Mathematics, Poll, Python, Reinforcement Learning, scikit-learn, SQL, Statistics
- Book Chapter: The Art of Statistics: Learning from Data - Sep 3, 2020.
Get a free book chapter from "The Art of Statistics: Learning from Data" by a leading researcher Sir David John Spiegelhalter. This excerpt takes a forensic look at data surrounding the victims of the UK most prolific serial killer and shows how a simple search for patterns reveals critical details.
Tags: Book, Crime, JMP, Statistics
- Which methods should be used for solving linear regression? - Sep 2, 2020.
As a foundational set of algorithms in any machine learning toolbox, linear regression can be solved with a variety of approaches. Here, we discuss. with with code examples, four methods and demonstrate how they should be used.
Tags: Gradient Descent, Linear Regression, numpy, Python, Statistics, SVD
These Data Science Skills will be your Superpower - Aug 20, 2020.
Learning data science means learning the hard skills of statistics, programming, and machine learning. To complete your training, a broader set of soft skills will round out your capabilities as an effective and successful professional Data Scientist.
Tags: Communication, Data Preparation, Data Science Skills, Data Visualization, Mathematics, Statistics
- KDnuggets™ News 20:n32, Aug 19: The List of Top 10 Data Science Lists; Data Science MOOCs with Substance - Aug 19, 2020.
The List of Top 10 Lists in Data Science; Going Beyond Superficial: Data Science MOOCs with Substance; Introduction to Statistics for Data Science; Content-Based Recommendation System using Word Embeddings; How Natural Language Processing Is Changing Data Analytics
Tags: Courses, Data Analytics, Data Science, Data Science Skills, MOOC, NLP, Recommendation Engine, Recommender Systems, Statistics, Word Embeddings
- Hypothesis Test for Real Problems - Aug 14, 2020.
Hypothesis tests are significant for evaluating answers to questions concerning samples of data.
Tags: P-value, Statistics
- Introduction to Statistics for Data Science - Aug 12, 2020.
Statistics is foundational for Data Science and a crucial skill to master for any practitioner. This advanced introduction reviews with examples the fundamental concepts of inferential statistics by illustrating the differences between Point Estimators and Confidence Intervals Estimates.
Tags: Beginners, Data Science, Statistics
- R squared Does Not Measure Predictive Capacity or Statistical Adequacy - Jul 31, 2020.
The fact that R-squared shouldn't be used for deciding if you have an adequate model is counter-intuitive and is rarely explained clearly. This demonstration overviews how R-squared goodness-of-fit works in regression analysis and correlations, while showing why it is not a measure of statistical adequacy, so should not suggest anything about future predictive performance.
Tags: Predictive Analytics, Regression, Statistics
- A Complete Guide To Survival Analysis In Python, part 3 - Jul 30, 2020.
Concluding this three-part series covering a step-by-step review of statistical survival analysis, we look at a detailed example implementing the Kaplan-Meier fitter based on different groups, a Log-Rank test, and Cox Regression, all with examples and shared code.
Tags: Jupyter, Python, Regression, Statistics, Survival Analysis
Essential Resources to Learn Bayesian Statistics - Jul 28, 2020.
If you are interesting in becoming better at statistics and machine learning, then some time should be invested in diving deeper into Bayesian Statistics. While the topic is more advanced, applying these fundamentals to your work will advance your understanding and success as an ML expert.
Tags: Bayesian, Machine Learning, Markov Chain, Statistics
- Demystifying Statistical Significance - Jul 17, 2020.
With more professionals from a wide range of less technical fields diving into statistical analysis and data modeling, these experimental techniques can seem daunting. To help with these hurdles, this article clarifies some misconceptions around p-values, hypothesis testing, and statistical significance.
Tags: P-value, Statistical Significance, Statistics
- Before Probability Distributions - Jul 16, 2020.
Why do we use probability distributions, and why do they matter?
Tags: Distribution, Probability, Statistics
- A Complete Guide To Survival Analysis In Python, part 2 - Jul 14, 2020.
Continuing with the second of this three-part series covering a step-by-step review of statistical survival analysis, we look at a detailed example implementing the Kaplan-Meier fitter theory as well as the Nelson-Aalen fitter theory, both with examples and shared code.
Tags: Python, Statistics, Survival Analysis
A Complete Guide To Survival Analysis In Python, part 1 - Jul 7, 2020.
This three-part series covers a review with step-by-step explanations and code for how to perform statistical survival analysis used to investigate the time some event takes to occur, such as patient survival during the COVID-19 pandemic, the time to failure of engineering products, or even the time to closing a sale after an initial customer contact.
Tags: Python, Statistics, Survival Analysis
- The 8 Basic Statistics Concepts for Data Science - Jun 24, 2020.
Understanding the fundamentals of statistics is a core capability for becoming a Data Scientist. Review these essential ideas that will be pervasive in your work and raise your expertise in the field.
Tags: Beginners, Causation, Correlation, Linear Regression, Probability, Statistics
4 Free Math Courses to do and Level up your Data Science Skills - Jun 22, 2020.
Just as there is no Data Science without data, there's no science in data without mathematics. Strengthening your foundational skills in math will level you up as a data scientist that will enable you to perform with greater expertise.
Tags: Bayesian, Coursera, edX, Inference, Linear Algebra, Mathematics, Online Education, Principal component analysis, Probability, Python, Statistics
Overview of data distributions - Jun 10, 2020.
With so many types of data distributions to consider in data science, how do you choose the right one to model your data? This guide will overview the most important distributions you should be familiar with in your work.
Tags: Binomial, Distribution, Normal Distribution, Poisson Distribution, Probability, Statistics
- KDnuggets™ News 20:n23, Jun 10: Largest Dataset you analyzed? If you start statistics all over again, where would you start? GPT-3 - Jun 10, 2020.
#BlackLivesMatter. In this issue: If you had to start statistics all over again, where would you start? New Poll: What was the largest dataset you analyzed? Another Great NLP Course from Stanford; Naive Bayes: Everything you need to know; GPT-3 - a giant leap for Deep Learning and NLP?
Tags: Naive Bayes, NLP, Stanford, Statistics
If you had to start statistics all over again, where would you start? - Jun 5, 2020.
If you are just diving into learning statistics, then where do you begin? Find insight from those who have tread in these waters before, and see what they might have done differently along their personal journeys in statistics.
Tags: Advanced Statistics, Advice, Bayesian, Career Advice, Statistician, Statistics
- STIPS – Statistical Thinking for Industrial Problem Solving – A free online statistics course - Jun 2, 2020.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
Tags: Course, JMP, Online Education, Statistics
- Appropriately Handling Missing Values for Statistical Modelling and Prediction - May 22, 2020.
Many statisticians in industry agree that blindly imputing the missing values in your dataset is a dangerous move and should be avoided without first understanding why the data is missing in the first place.
Tags: Advice, Analytics, Business Analytics, Data Preparation, Data Science, Data Scientist, Missing Values, Statistics
- Looking Normal(ly Distributed) - May 20, 2020.
This article investigates when some probability distributions look normal "enough" for a statistical test.
Tags: Data Visualization, Distribution, Normal Distribution, Probability, Statistics
- Evidence Counterfactuals for explaining predictive models on Big Data - May 18, 2020.
Big Data generated by people -- such as, social media posts, mobile phone GPS locations, and browsing history -- provide enormous prediction value for AI systems. However, explaining how these models predict with the data remains challenging. This interesting explanation approach considers how a model would behave if it didn't have the original set of data to work with.
Tags: Big Data, Explainability, Predictive Modeling, Predictive Models, Statistics
- Were 21% of New York City residents really infected with the novel coronavirus? - May 6, 2020.
Understanding the types of statistical bias that pop up in popular media and reporting is especially important during this pandemic where the data -- and our global response to the data -- directly impact peoples' lives.
Tags: Bias, Coronavirus, Mistakes, New York City, Statistics
- Statistical Thinking for Industrial Problem Solving – a free online statistics course - May 5, 2020.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
Tags: Course, JMP, Online Education, Statistics
A Concise Course in Statistical Inference: The Free eBook - Apr 27, 2020.
Check out this freely available book, All of Statistics: A Concise Course in Statistical Inference, and learn the probability and statistics needed for success in data science.
Tags: Book, Free ebook, Mathematics, Statistics
Should Data Scientists Model COVID19 and other Biological Events - Apr 22, 2020.
Biostatisticians use statistical techniques that your current everyday data scientists have probably never heard of. This is a great example where lack of domain knowledge exposes you as someone that does not know what they are doing and are merely hopping on a trend.
Tags: Advice, COVID-19, Data Science, Data Scientist, Statistics
- Statistical Thinking for Industrial Problem Solving – a free online statistics course - Apr 9, 2020.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
Tags: Course, JMP, Online Education, Statistics
- Free online statistics course – Improve your analytics knowledge - Mar 26, 2020.
This online course is available – for free – to anyone interested in using data to solve problems better.
Tags: Course, JMP, Online Education, Statistics
- Data Science Curriculum for self-study - Feb 26, 2020.
Are you asking the question, "how do I become a Data Scientist?" This list recommends the best essential topics to gain an introductory understanding for getting started in the field. After learning these basics, keep in mind that doing real data science projects through internships or competitions is crucial to acquiring the core skills necessary for the job.
Tags: Advice, Data Science, Data Science Education, Data Visualization, Mathematics, Probability, Programming, Statistics
- Statistical Thinking for Industrial Problem Solving: a free online course. - Jan 13, 2020.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
Tags: Course, JMP, Online Education, Statistics
- Statistical Thinking for Industrial Problem Solving: a free online course - Dec 3, 2019.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
Tags: Course, JMP, Online Education, Statistics
- An Eight-Step Checklist for An Analytics Project - Nov 6, 2019.
Follow these eight headings of an audit sheet that business analysts should address before submitting the results of their analytics project. One recommended approach is to rewrite each step as a question, answer it, and then attach it to your project.
Tags: Analytics, Checklist, Deployment, Feature Selection, Statistics
- KDnuggets™ News 19:n42, Nov 6: 5 Statistical Traps Data Scientists Should Avoid; 10 Free Must-Read Books on AI - Nov 6, 2019.
Learn about statistical fallacies Data Scientists should avoid; New and quite amazing Deep Learning capabilities FB has been quietly open-sourcing; Top Machine Learning tools for Developers; How to build a Neural Network from scratch and more.
Tags: AI, Free ebook, Mistakes, Statistics
- Probability Learning: Maximum Likelihood - Nov 5, 2019.
The maths behind Bayes will be better understood if we first cover the theory and maths underlying another fundamental method of probabilistic machine learning: Maximum Likelihood. This post will be dedicated to explaining it.
Tags: Learning, Probability, Statistics
5 Statistical Traps Data Scientists Should Avoid - Oct 30, 2019.
Here are five statistical fallacies — data traps — which data scientists should be aware of and definitely avoid.
Tags: Bias, Fallacies, Simpson's Paradox, Statistics

How to Become a (Good) Data Scientist – Beginner Guide - Oct 16, 2019.
A guide covering the things you should learn to become a data scientist, including the basics of business intelligence, statistics, programming, and machine learning.
Tags: Beginners, BI, Data Scientist, Sciforce, Statistics
- An Overview of Density Estimation - Oct 14, 2019.
Density estimation is estimating the probability density function of the population from the sample. This post examines and compares a number of approaches to density estimation.
Tags: Generative Adversarial Network, Probability, Statistics
- Statistical Thinking for Industrial Problem Solving: a free online course - Oct 2, 2019.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
Tags: JMP, Online Education, Statistics
6 bits of advice for Data Scientists - Sep 25, 2019.
As a data scientist, you can get lost in your daily dives into the data. Consider this advice to be certain to follow in your work for being diligent and more impactful for your organization.
Tags: Advice, Data Cleaning, Data Scientist, Metrics, Overfitting, Statistics
- Beta Distribution: What, When & How - Sep 25, 2019.
This article covers the beta distribution, and explains it using baseball batting averages.
Tags: Distribution, Probability, Statistics
Which Data Science Skills are core and which are hot/emerging ones? - Sep 17, 2019.
We identify two main groups of Data Science skills: A: 13 core, stable skills that most respondents have and B: a group of hot, emerging skills that most do not have (yet) but want to add. See our detailed analysis.
Tags: Career, Data Science Skills, Data Visualization, Deep Learning, Excel, Machine Learning, Poll, Python, PyTorch, Scala, Skills, Statistics, TensorFlow
- How Bad is Multicollinearity? - Sep 17, 2019.
For some people anything below 60% is acceptable and for certain others, even a correlation of 30% to 40% is considered too high because it one variable may just end up exaggerating the performance of the model or completely messing up parameter estimates.
Tags: Analytics, Multicollinearity, Regression, Statistics
- What’s the difference between analytics and statistics? - Sep 6, 2019.
From asking the best questions about data to answering those questions with certainty, understanding the value of these two seemingly different professions is clarified when you see how they should work together.
Tags: Analytics, Explained, Statistics
Statistical Modelling vs Machine Learning - Aug 14, 2019.
At times it may seem Machine Learning can be done these days without a sound statistical background but those people are not really understanding the different nuances. Code written to make it easier does not negate the need for an in-depth understanding of the problem.
Tags: Advice, Data Science, Machine Learning, Statistics
- What is Poisson Distribution? - Aug 14, 2019.
An solid overview of the Poisson distribution, starting from why it is needed, how it stacks up to binomial distribution, deriving its formula mathematically, and more.
Tags: Distribution, Poisson Distribution, Probability, Statistics
- Statistical Thinking for Industrial Problem Solving (STIPS) – a free online course. - Aug 2, 2019.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
Tags: JMP, Online Education, Statistics
- P-values Explained By Data Scientist - Jul 30, 2019.
This article is designed to give you a full picture from constructing a hypothesis testing to understanding p-value and using that to guide our decision making process.
Tags: Data Science, Data Scientist, P-value, Statistics
- Annotated Heatmaps of a Correlation Matrix in 5 Simple Steps - Jul 9, 2019.
A heatmap is a graphical representation of data in which data values are represented as colors. That is, it uses color in order to communicate a value to the reader. This is a great tool to assist the audience towards the areas that matter the most when you have a large volume of data.
Tags: Data Visualization, Python, Statistics
- How do you check the quality of your regression model in Python? - Jul 2, 2019.
Linear regression is rooted strongly in the field of statistical learning and therefore the model must be checked for the ‘goodness of fit’. This article shows you the essential steps of this task in a Python ecosystem.
Tags: Data Science, Multicollinearity, Python, Regression, Statistics
- Top KDnuggets Tweets, Jun 12 – 18: The biggest mistake while learning #Python for #datascience; 5 practical statistical concepts for data scientists - Jun 19, 2019.
Also: Resources for developers transitioning into data science; Best Data Visualization Techniques for small and large data; Top Data Science and Machine Learning Methods Used in 2018, 2019
Tags: Advice, Python, Statistics, Top tweets
- KDnuggets™ News 19:n23, Jun 19: Useful Stats for Data Scientists; Python, TensorFlow & R Winners in Latest Job Report - Jun 19, 2019.
This week on KDnuggets: 5 Useful Statistics Data Scientists Need to Know; Data Science Jobs Report 2019: Python Way Up, TensorFlow Growing Rapidly, R Use Double SAS; How to Learn Python for Data Science the Right Way; The Machine Learning Puzzle, Explained; Scalable Python Code with Pandas UDFs; and much more!
Tags: Data Science, Data Scientist, Machine Learning, Pandas, Python, R, Report, SAS, Scalability, Statistics, TensorFlow
5 Useful Statistics Data Scientists Need to Know - Jun 14, 2019.
A data scientist should know how to effectively use statistics to gain insights from data. Here are five useful and practical statistical concepts that every data scientist must know.
Tags: Data Science, Data Scientist, Statistics
- All Models Are Wrong – What Does It Mean? - Jun 12, 2019.
During your adventures in data science, you may have heard “all models are wrong.” Let’s unpack this famous quote to understand how we can still make models that are useful.
Tags: Advice, Linear Regression, Modeling, Statistics
Top 10 Statistics Mistakes Made by Data Scientists - Jun 7, 2019.
The following are some of the most common statistics mistakes made by data scientists. Check this list often to make sure you are not making any of these while applying statistics to data science.
Tags: Data Science, Data Scientist, GitHub, Mistakes, Statistics
- Statistical Thinking for Industrial Problem Solving (STIPS): a free online course. - Jun 4, 2019.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
Tags: JMP, Online Education, Statistics
- Separating signal from noise - Jun 4, 2019.
When we are building a model, we are making the assumption that our data has two parts, signal and noise. Signal is the real pattern, the repeatable process that we hope to capture and describe. The noise is everything else that gets in the way of that.
Tags: Noise, Regression, Statistics, Time Series
- What Does a Lady Tasting Tea Have to Do with Science? - May 31, 2019.
Design of Experiments (DOE) is a statistical concept used to find the cause-and-effect relationships. Surprisingly, an experiment arising from a casual conversation about tea-drinking is one of the first examples of an experiment designed using statistical ideas.
Tags: Design of Experiments, Randomization, Statistics
- Probability Mass and Density Functions - May 21, 2019.
This content is part of a series about the chapter 3 on probability from the Deep Learning Book by Goodfellow, I., Bengio, Y., and Courville, A. (2016). It aims to provide intuitions/drawings/python code on mathematical theories and is constructed as my understanding of these concepts.
Pages: 1 2
Tags: Mathematics, Probability, Statistics
- Modeling 101 - May 13, 2019.
In the past couple of decades, innovation in statistics and machine learning has been increasing at a rapid pace and we are now able to do things unimaginable when I began my career.
Tags: Data Science, Modeling, Statistics
- Naive Bayes: A Baseline Model for Machine Learning Classification Performance - May 7, 2019.
We can use Pandas to conduct Bayes Theorem and Scikitlearn to implement the Naive Bayes Algorithm. We take a step by step approach to understand Bayes and implementing the different options in Scikitlearn.
Pages: 1 2
Tags: Algorithms, Data Science, Machine Learning, Naive Bayes, Python, scikit-learn, Statistics
- Statistical Thinking for Industrial Problem Solving (STIPS) – a free online course. - May 3, 2019.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
Tags: JMP, Online Education, Statistics
How to correctly select a sample from a huge dataset in machine learning - May 1, 2019.
We explain how choosing a small, representative dataset from a large population can improve model training reliability.
Tags: Machine Learning, R, Sampling, Statistics
- Statistical Thinking for Industrial Problem Solving (STIPS) – a free online course - Apr 5, 2019.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
Tags: JMP, Online Education, Statistics
- Spatio-Temporal Statistics: A Primer - Apr 5, 2019.
Marketing scientist Kevin Gray asks University of Missouri Professor Chris Wikle about Spatio-Temporal Statistics and how it can be used in science and business.
Tags: Interview, Spatio-Temporal, Statistics
- Wake Forest University: Teaching Professor/Professor of the Practice in Statistics/Analytics [Winston-Salem, NC] - Mar 18, 2019.
The Wake Forest University School of Business is seeking qualified candidates for a Teaching Professor/Professor of the Practice in Statistics/Analytics. This individual will be expected to teach graduate courses in areas such as Data Analysis & Business Modeling, Data Mining & Machine Learning, and Forecasting.
Tags: Analytics, NC, Professor, Statistics, Wake Forest University, Winston-Salem
- The 7 Myths of Data Anonymisation - Mar 12, 2019.
Anonymisation has always been rather seen as a necessary evil instead of a helpful tool. That’s why plenty of myths have arisen around that technology over the years.
Tags: Anonymity, Customer Analytics, Differential Privacy, GDPR, Privacy, Statistics
- Beating the Bookies with Machine Learning - Mar 8, 2019.
We investigate how to use a custom loss function to identify fair odds, including a detailed example using machine learning to bet on the results of a darts match and how this can assist you in beating the bookmaker.
Tags: Machine Learning, PyTorch, Sports, Statistics
- Statistical Thinking for Industrial Problem Solving – a free online course - Feb 6, 2019.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
Tags: JMP, Online Education, Statistics
- From Good to Great Data Science, Part 1: Correlations and Confidence - Feb 5, 2019.
With the aid of some hospital data, part one describes how just a little inexperience in statistics could result in two common mistakes.
Tags: Correlation, Data Science, Python, Statistics
The Essential Data Science Venn Diagram - Feb 4, 2019.
A deeper examination of the interdisciplinary interplay involved in data science, focusing on automation, validity and intuition.
Tags: Analytics, Data Science, Machine Learning, Statistics, Venn Diagram
- Southern Illinois University Edwardsville: Director of the Center for Predictive Analytics/(Associate) Professor of Mathematics and Statistics [Edwardsville, IL] - Jan 4, 2019.
Southern Illinois University Edwardsville (SIUE) is establishing the Center for Predictive Analytics (C-PAN), and is seeking an innovative, visionary director for the center who will provide centralized leadership in establishing research and educational initiatives across academic units at SIUE.
Tags: Director, Edwardsville, Faculty, IL, Mathematics, Professor, Southern Illinois University Edwardsville, Statistics
Introduction to Statistics for Data Science - Dec 17, 2018.
This tutorial helps explain the central limit theorem, covering populations and samples, sampling distribution, intuition, and contains a useful video so you can continue your learning.
Tags: Data Science, Statistics
- A comprehensive list of Machine Learning Resources: Open Courses, Textbooks, Tutorials, Cheat Sheets and more - Dec 7, 2018.
A thorough collection of useful resources covering statistics, classic machine learning, deep learning, probability, reinforcement learning, and more.
Tags: Cheat Sheet, Data Science Education, Deep Learning, Machine Learning, Mathematics, Open Source, Reinforcement Learning, Resources, Statistics
The 5 Basic Statistics Concepts Data Scientists Need to Know - Nov 13, 2018.
Today, we’re going to look at 5 basic statistics concepts that data scientists need to know and how they can be applied most effectively!
Tags: Data Science, Data Scientist, Statistics
- Quantum Machine Learning: A look at myths, realities, and future projections - Nov 5, 2018.
An overview of quantum computing and quantum algorithm design, including current state of the hardware and algorithm design within the existing systems.
Tags: Machine Learning, Python, Quantum Computing, Statistics
- How I Learned to Stop Worrying and Love Uncertainty - Oct 24, 2018.
This is a written version of Data Scientist Adolfo Martínez’s talk at Software Guru’s DataDay 2017. There is a link to the original slides (in Spanish) at the top of this post.
Pages: 1 2
Tags: Bayesian, Statistics, Uncertainty
- University of San Francisco: Assistant Professor, Tenure Track, Mathematics and Statistics [San Francisco, CA] - Oct 17, 2018.
The University of San Francisco invites applications for a tenure-track Assistant Professor position to begin August 2019. We seek well-qualified candidates in the areas of applied mathematics or statistics, with a focus on the extraction of knowledge from data.
Tags: CA, Mathematics, Professor, San Francisco, Statistics, University of San Francisco
- Mindstrong Health: Sr Data Scientist / Machine Learning, Statistics, Coding [Palo Alto, CA] - Oct 17, 2018.
Mindstrong Health is seeking a Sr Data Scientist in Palo Alto, CA, who is passionate about our mission, committed to excellence and excited to build a company that will address one of the greatest health challenges of our time.
Tags: CA, Data Scientist, Machine Learning, Mindstrong Health, Palo Alto, Statistics
- Unfolding Naive Bayes From Scratch - Sep 25, 2018.
Whether you are a beginner in Machine Learning or you have been trying hard to understand the Super Natural Machine Learning Algorithms and you still feel that the dots do not connect somehow, this post is definitely for you!
Pages: 1 2
Tags: Bayesian, Classification, Naive Bayes, Probability, Statistics

Machine Learning Cheat Sheets - Sep 11, 2018.
Check out this collection of machine learning concept cheat sheets based on Stanord CS 229 material, including supervised and unsupervised learning, neural networks, tips & tricks, probability & stats, and algebra & calculus.
Tags: Cheat Sheet, Deep Learning, Machine Learning, Mathematics, Neural Networks, Probability, Statistics, Supervised Learning, Tips, Unsupervised Learning
- 5 Things to Know About A/B Testing - Sep 7, 2018.
This article presents 5 things to know about A/B testing, from appropriate sample sizes, to statistical confidence, to A/B testing usefulness, and more.
Tags: A/B Testing, Applied Statistics, Psychology, Statistics

Essential Math for Data Science: ‘Why’ and ‘How’ - Sep 6, 2018.
It always pays to know the machinery under the hood (even at a high level) than being just the guy behind the wheel with no knowledge about the car.
Tags: Data Science, Mathematics, MOOC, Optimization, Statistics
- What on earth is data science? - Sep 4, 2018.
An overview and discussion around data science, covering the history behind the term, data mining, statistical inference, machine learning, data engineering and more.
Tags: Data Mining, Data Science, Decision Making, Statistics
- Basic Statistics in Python: Probability - Aug 21, 2018.
At the most basic level, probability seeks to answer the question, "What is the chance of an event happening?" To calculate the chance of an event happening, we also need to consider all the other events that can occur.
Tags: Normal Distribution, Probability, Python, Statistics
- Interpreting a data set, beginning to end - Aug 20, 2018.
Detailed knowledge of your data is key to understanding it! We review several important methods that to understand the data, including summary statistics with visualization, embedding methods like PCA and t-SNE, and Topological Data Analysis.
Tags: Analytics, Big Data, Data Science, Data Visualization, Machine Learning, SAS, Statistics, t-SNE
- Top KDnuggets tweets, Aug 1-14: Basic Statistics in Python; Essential Command Line Tools for Data Scientists - Aug 15, 2018.
Basic Statistics in Python: Descriptive Statistics; Top 12 Essential Command Line Tools for Data Scientists; WTF is a Tensor?!?; How GOAT Taught a Machine to Love Sneakers;
Tags: Python, Statistics, Tensor, Top tweets
- KDnuggets™ News 18:n30, Aug 8: Iconic Data Visualisation; Data Scientist Interviews Demystified; Simple Statistics in Python - Aug 8, 2018.
Also: Selecting the Best Machine Learning Algorithm for Your Regression Problem; From Data to Viz: how to select the the right chart for your data; Only Numpy: Implementing GANs and Adam Optimizer using Numpy; Programming Best Practices for Data Science
Tags: Data Science, Data Visualization, Generative Adversarial Network, Interview, Machine Learning, numpy, Python, Regression, Statistics
Basic Statistics in Python: Descriptive Statistics - Aug 1, 2018.
This article covers defining statistics, descriptive statistics, measures of central tendency, and measures of spread. This article assumes no prior knowledge of statistics, but does require at least a general knowledge of Python.
Tags: Descriptive Analytics, Python, Statistics
- What is Normal? - Jul 31, 2018.
I saw an article recently that referred to the normal curve as the data scientist's best friend. We examine myths around the normal curve, including - is most data normally distributed?
Tags: Distribution, Normal Distribution, Sampling, Statistics, Statistics.com
Causation in a Nutshell - Jul 20, 2018.
Every move we make, every breath we take, and every heartbeat is an effect that is caused. Even apparent randomness may just be something we cannot explain.
Tags: Causality, Causation, Statistics
Explaining the 68-95-99.7 rule for a Normal Distribution - Jul 19, 2018.
This post explains how those numbers were derived in the hope that they can be more interpretable for your future endeavors.
Tags: Data Analysis, Data Science, Normal Distribution, Python, Statistics
- Why Data Scientists Love Gaussian - Jun 26, 2018.
Gaussian distribution model, often identified with its iconic bell shaped curve, also referred as Normal distribution, is so popular mainly because of three reasons.
Tags: Distribution, Probability, Statistics
- Every time someone runs a correlation coefficient on two time series, an angel loses their wings - Jun 18, 2018.
We all know correlation doesn’t equal causality at this point, but when working with time series data, correlation can lead you to come to the wrong conclusion.
Tags: Correlation, Data Mining, Statistics, Time Series
- Statistics, Causality, and What Claims are Difficult to Swallow: Judea Pearl debates Kevin Gray - Jun 15, 2018.
While KDnuggets takes no side, we present the informative and respectful back and forth as we believe it has value for our readers. We hope that you agree.
Tags: AI, Computer Science, Data Science, Judea Pearl, Statistics
- A Better Stats 101 - Jun 12, 2018.
Statistics encourages us to think systemically and recognize that variables normally do not operate in isolation, and that an effect usually has multiple causes. Some call this multivariate thinking. Statistics is particularly useful for uncovering the Why.
Tags: Data Science, Machine Learning, Statistics
- The Statistics of Gang Violence - Jun 6, 2018.
For Carlos Carcach, Professor & Director, Center for Public Policy at the Escuela Superior de Economía y Negocios (ESEN) in Santa Tecla, El Salvador, gangs are an object of intellectual curiosity and the subject of his research.
Tags: Certificate, Data Science, El Salvador, Predictive Analytics, Statistics, Statistics.com
Football World Cup 2018 Predictions: Germany vs Brazil in the final, and more - Jun 5, 2018.
Looking ahead to the FIFA World Cup that kicks off this month (14th June), we have created the official KDnuggets predictions.
Tags: Data Analysis, Football, Soccer, Sports, Statistics, World Cup
- The Book of Why - Jun 1, 2018.
Judea Pearl has made noteworthy contributions to artificial intelligence, Bayesian networks, and causal analysis. These achievements notwithstanding, Pearl holds some views many statisticians may find odd or exaggerated.
Tags: Bayesian Networks, Causality, Data Science, Judea Pearl, Simpson's Paradox, Statistics
- Frequentists Fight Back - May 24, 2018.
Frequentist methods are sometimes described as “classical”, though most have only appeared in recent decades and new ones are under development as you read this. Whatever we call it, this branch of statistics is very much alive.
Tags: Bayesian, Statistics
- 24houranswers: Analytics / Data Science / Math / Statistics Tutors - May 9, 2018.
Seeking qualified Ph.D. students or faculty members for the position of Tutor/Instructor to provide one-on-one lectures to the needs of our students in Applied Analytics, Computer Science, Applied Math and Statistics, and more.
Tags: Analytics, Data Science, Education, Mathematics, Statistics, Telecommute
- Skewness vs Kurtosis – The Robust Duo - May 4, 2018.
Kurtosis and Skewness are very close relatives of the “data normalized statistical moment” family – Kurtosis being the fourth and Skewness the third moment, and yet they are often used to detect very different phenomena in data. At the same time, it is typically recommendable to analyse the outputs of both together to gather more insight and understand the nature of the data better.
Tags: Data Science, Descriptive Analytics, Statistics
Key Algorithms and Statistical Models for Aspiring Data Scientists - Apr 16, 2018.
This article provides a summary of key algorithms and statistical techniques commonly used in industry, along with a short resource related to these techniques.
Tags: Algorithms, Data Science, Machine Learning, Online Education, Statistics
- Descriptive Statistics: The Mighty Dwarf of Data Science – Crest Factor - Apr 6, 2018.
No other mean of data description is more comprehensive than Descriptive Statistics and with the ever increasing volumes of data and the era of low latency decision making needs, its relevance will only continue to increase.
Tags: Data Science, Descriptive Analytics, Statistics
- Descriptive Statistics: The Mighty Dwarf of Data Science - Mar 20, 2018.
No other mean of data description is more comprehensive than Descriptive Statistics and with the ever increasing volumes of data and the era of low latency decision making needs, its relevance will only continue to increase.
Tags: Data Science, Descriptive Analytics, Statistics
- Madrid Advanced Statistics and Data Mining Summer School - Mar 19, 2018.
The courses cover topics such as Neural Networks and Deep Learning, Bayesian Networks, Big Data with Apache Spark, Bayesian Inference, Text Mining and Time Series. Each course has theoretical and practical classes, the latter done with R or Python.
Tags: Data Mining, Madrid, Spain, Statistics, Summer School
- Multiscale Methods and Machine Learning - Mar 19, 2018.
We highlight recent developments in machine learning and Deep Learning related to multiscale methods, which analyze data at a variety of scales to capture a wider range of relevant features. We give a general overview of multiscale methods, examine recent successes, and compare with similar approaches.
Tags: Algorithms, Data Science, Deep Learning, Machine Learning, Statistics
- A Few Statistics Tips for Marketers - Mar 6, 2018.
Statistics can help good marketers become better marketers. Here are a few things they should know about stats.
Tags: Marketing, Statistics
- Histogram 202: Tips and Tricks for Better Data Science - Feb 15, 2018.
We show how to make an ideal histogram, share some tips, and give examples. Let's dive into the world of binning.
Tags: Data Science, Histogram, Statistics
- Propensity Score Matching in R - Jan 18, 2018.
Propensity scores are an alternative method to estimate the effect of receiving treatment when random assignment of treatments to subjects is not feasible.
Pages: 1 2
Tags: Bias, R, Statistics
- How Not To Lie With Statistics - Jan 11, 2018.
Darrell Huff's classic How to Lie with Statistics is perhaps more relevant than ever. In this short article, I revisit this theme from some different angles.
Tags: Statistics, Trust
- Robust Algorithms for Machine Learning - Dec 11, 2017.
This post mentions some of the advantages of implementing robust, non-parametric methods into our Machine Learning frameworks and models.
Tags: ActiveState, Keras, Machine Learning, Python, Statistics
- 5 Tricks When A/B Testing Is Off The Table - Dec 8, 2017.
Sometimes you cannot do A/B testing, but it does not mean we have to fly blind - there is a range of econometric methods that can illuminate the causal relationships at play.
Pages: 1 2
Tags: A/B Testing, Econometrics, Regression, Statistics
- KDnuggets™ News 17:n45, Nov 29: New Poll: Data Science Methods Used? Deep Learning Specialization: 21 Lessons Learned - Nov 29, 2017.
Also The 10 Statistical Techniques Data Scientists Need to Master; Did Spark Really Kill Hadoop? A Framework for Textual Data Science.
Tags: Andrew Ng, Apache Spark, Deep Learning, Statistics, Text Mining
- You have created your first Linear Regression Model. Have you validated the assumptions? - Nov 15, 2017.
Linear Regression is an excellent starting point for Machine Learning, but it is a common mistake to focus just on the p-values and R-Squared values while determining validity of model. Here we examine the underlying assumptions of a Linear Regression, which need to be validated before applying the model.
Tags: Data Science, Linear Regression, Machine Learning, Multicollinearity, Statistics
The 10 Statistical Techniques Data Scientists Need to Master - Nov 15, 2017.
The author presents 10 statistical techniques which a data scientist needs to master. Build up your toolbox of data science tools by having a look at this great overview post.
Pages: 1 2
Tags: Algorithms, Data Science, Data Scientist, Machine Learning, Statistical Learning, Statistics
- How Bayesian Networks Are Superior in Understanding Effects of Variables - Nov 9, 2017.
Bayes Nets have remarkable properties that make them better than many traditional methods in determining variables’ effects. This article explains the principle advantages.
Tags: Bayesian, Bayesian Networks, Predictive Models, Probability, Regression, Statistics
- Conjoint Analysis: A Primer - Nov 1, 2017.
Conjoint is another of those things everyone talks about but many are confused about…
Tags: Statistical Analysis, Statistics
- Monty Hall chooses the final exit door - Oct 7, 2017.
Monty Hall, the game show host, died last week. He was the host of the popular show "Let's Make a Deal", where contestants try to guess which one of 3 doors hides a valuable prize.
Tags: Monty Hall, Statistics, Statistics.com
- Statistical Mistakes Even Scientists Make - Oct 3, 2017.
Scientists are all experts in statistics, right? Wrong.
Tags: Scientist, Statistician, Statistics
30 Essential Data Science, Machine Learning & Deep Learning Cheat Sheets - Sep 22, 2017.
This collection of data science cheat sheets is not a cheat sheet dump, but a curated list of reference materials spanning a number of disciplines and tools.
Pages: 1 2 3
Tags: Cheat Sheet, Data Science, Deep Learning, Machine Learning, Neural Networks, Probability, Python, R, SQL, Statistics