- How to Determine the Best Fitting Data Distribution Using Python - Sep 30, 2021.
Approaches to data sampling, modeling, and analysis can vary based on the distribution of your data, and so determining the best fit theoretical distribution can be an essential step in your data exploration process.
- Advanced Statistical Concepts in Data Science - Sep 30, 2021.
The article contains some of the most commonly used advanced statistical concepts along with their Python implementation.
- Don’t Touch a Dataset Without Asking These 10 Questions - Sep 20, 2021.
Selecting the right dataset is critical for the success of your AI project.
- Rejection Sampling with Python - Mar 24, 2021.
Read this article on rejection sampling with examples using the Normal and Cauchy Distributions.
- Comprehensive Guide to the Normal Distribution - Jan 18, 2021.
Drop in for some tips on how this fundamental statistics concept can improve your data science.
- Essential Math for Data Science: The Poisson Distribution - Dec 29, 2020.
The Poisson distribution, named after the French mathematician Denis Simon Poisson, is a discrete distribution function describing the probability that an event will occur a certain number of times in a fixed time (or space) interval.
- Fast and Intuitive Statistical Modeling with Pomegranate - Dec 21, 2020.
Pomegranate is a delicious fruit. It can also be a super useful Python library for statistical analysis. We will show how in this article.
- Before Probability Distributions - Jul 16, 2020.
Why do we use probability distributions, and why do they matter?
- KDnuggets™ News 20:n24, Jun 17: Easy Speech-to-Text with Python; Data Distributions Overview; Java for Data Scientists - Jun 17, 2020.
Also: Deploy a Machine Learning Pipeline to the Cloud Using a Docker Container; Five Cognitive Biases In Data Science (And how to avoid them); Understanding Machine Learning: The Free eBook; Simplified Mixed Feature Type Preprocessing in Scikit-Learn with Pipelines; A Complete guide to Google Colab for Deep Learning
- Overview of data distributions - Jun 10, 2020.
With so many types of data distributions to consider in data science, how do you choose the right one to model your data? This guide will overview the most important distributions you should be familiar with in your work.
- Looking Normal(ly Distributed) - May 20, 2020.
This article investigates when some probability distributions look normal "enough" for a statistical test.
- Probability Distributions in Data Science - Feb 26, 2020.
Some machine learning models are designed to work best under some distribution assumptions. Therefore, knowing with which distributions we are working with can help us to identify which models are best to use.
- Beta Distribution: What, When & How - Sep 25, 2019.
This article covers the beta distribution, and explains it using baseball batting averages.
- What is Poisson Distribution? - Aug 14, 2019.
An solid overview of the Poisson distribution, starting from why it is needed, how it stacks up to binomial distribution, deriving its formula mathematically, and more.
- KDnuggets™ News 19:n25, Jul 10: 5 Probability Distributions for Data Scientists; What the Machine Learning Engineer Job is Really Like - Jul 10, 2019.
This edition of the KDnuggets newsletter is double-sized after taking the holiday week off. Learn about probability distributions every data scientist should know, what the machine learning engineering job is like, making the most money with the least amount of risk, the difference between NLP and NLU, get a take on Nvidia's new data science workstation, and much, much more.
- 5 Probability Distributions Every Data Scientist Should Know - Jul 4, 2019.
Having an understanding of probability distributions should be a priority for data scientists. Make sure you know what you should by reviewing this post on the subject.
- What to do when your training and testing data come from different distributions - Jan 4, 2019.
However, sometimes only a limited amount of data from the target distribution can be collected. It may not be sufficient to build the needed train/dev/test sets. What to do in such a case? Let us discuss some ideas!
- The Long Tail of Medical Data - Nov 12, 2018.
This article discusses some issues related to medical data, relating specifically to power law distributions and computer aided diagnosis. Read on to see machine learning's place in the puzzle.
- The Intuitions Behind Bayesian Optimization with Gaussian Processes - Oct 19, 2018.
Bayesian Optimization adds a Bayesian methodology to the iterative optimizer paradigm by incorporating a prior model on the space of possible target functions. This article introduces the basic concepts and intuitions behind Bayesian Optimization with Gaussian Processes.
- What is Normal? - Jul 31, 2018.
I saw an article recently that referred to the normal curve as the data scientist's best friend. We examine myths around the normal curve, including - is most data normally distributed?
- Why Data Scientists Love Gaussian - Jun 26, 2018.
Gaussian distribution model, often identified with its iconic bell shaped curve, also referred as Normal distribution, is so popular mainly because of three reasons.
- Packaging and Distributing Your Python Project to PyPI for Installation Using pip - Jun 11, 2018.
This tutorial will explain the steps required to package your Python projects, distribute them in distribution formats using steptools, upload them into the Python Package Index (PyPI) repository using twine, and finally installation using Python installers such as pip and conda.
Pages: 1 2
- Error Analysis to your Rescue – Lessons from Andrew Ng, part 3 - Jan 29, 2018.
The last entry in a series of posts about Andrew Ng's lessons on strategies to follow when fixing errors in your algorithm
- Data Science Primer: Basic Concepts for Beginners - Aug 11, 2017.
This collection of concise introductory data science tutorials cover topics including the difference between data mining and statistics, supervised vs. unsupervised learning, and the types of patterns we can mine from data.
- Stanford Webinar, Mar 9: When big data seems too small - Feb 23, 2017.
On March 9, Stanford’s Dr. Gregory Valiant discusses the difficulties of and solutions for making accurate inferences in this challenging regime, in which the empirical distribution of the available data is misleading.
- Data Science Basics: Power Laws and Distributions - Dec 21, 2016.
Power laws and other relationships between observable phenomena may not seem like they are of any interest to data science, at least not to newcomers to the field, but this post provides an overview and suggests how they may be.
- Central Limit Theorem for Data Science – Part 2 - Aug 16, 2016.
This post continues an explanation of Central Limit Theorem started in a previous post, with additional details... and beer.
- Central Limit Theorem for Data Science - Aug 12, 2016.
This post is an introductory explanation of the Central Limit Theorem, and why it is (or should be) of importance to data scientists.
- What Statistics Topics are Needed for Excelling at Data Science? - Aug 2, 2016.
Here is a list of skills and statistical concepts suggested for excelling at data science, roughly in order of increasing complexity.
- Plausibility vs. probability, prior distributions, and the garden of forking paths - Jan 14, 2016.
A discussion on plausibility vs. probability: while many given events may be plausible, but they can’t all be probable.
- What is numbersense – test yours - Mar 25, 2014.
Kaiser Fung, Marketing and Analytics expert, and author of "Numbersense" book, explains what is numbersense in the age of Big Data. Test yours.