# Tag: Distribution (15)

**What to do when your training and testing data come from different distributions**- Jan 4, 2019.

However, sometimes only a limited amount of data from the target distribution can be collected. It may not be sufficient to build the needed train/dev/test sets. What to do in such a case? Let us discuss some ideas!**The Long Tail of Medical Data**- Nov 12, 2018.

This article discusses some issues related to medical data, relating specifically to power law distributions and computer aided diagnosis. Read on to see machine learning's place in the puzzle.**The Intuitions Behind Bayesian Optimization with Gaussian Processes**- Oct 19, 2018.

Bayesian Optimization adds a Bayesian methodology to the iterative optimizer paradigm by incorporating a prior model on the space of possible target functions. This article introduces the basic concepts and intuitions behind Bayesian Optimization with Gaussian Processes.**What is Normal?**- Jul 31, 2018.

I saw an article recently that referred to the normal curve as the data scientist's best friend. We examine myths around the normal curve, including - is most data normally distributed?**Why Data Scientists Love Gaussian**- Jun 26, 2018.

Gaussian distribution model, often identified with its iconic bell shaped curve, also referred as Normal distribution, is so popular mainly because of three reasons.**Packaging and Distributing Your Python Project to PyPI for Installation Using pip**- Jun 11, 2018.

This tutorial will explain the steps required to package your Python projects, distribute them in distribution formats using steptools, upload them into the Python Package Index (PyPI) repository using twine, and finally installation using Python installers such as pip and conda.**Error Analysis to your Rescue – Lessons from Andrew Ng, part 3**- Jan 29, 2018.

The last entry in a series of posts about Andrew Ng's lessons on strategies to follow when fixing errors in your algorithm**Data Science Primer: Basic Concepts for Beginners**- Aug 11, 2017.

This collection of concise introductory data science tutorials cover topics including the difference between data mining and statistics, supervised vs. unsupervised learning, and the types of patterns we can mine from data.**Stanford Webinar, Mar 9: When big data seems too small**- Feb 23, 2017.

On March 9, Stanford’s Dr. Gregory Valiant discusses the difficulties of and solutions for making accurate inferences in this challenging regime, in which the empirical distribution of the available data is misleading.**Data Science Basics: Power Laws and Distributions**- Dec 21, 2016.

Power laws and other relationships between observable phenomena may not seem like they are of any interest to data science, at least not to newcomers to the field, but this post provides an overview and suggests how they may be.**Central Limit Theorem for Data Science – Part 2**- Aug 16, 2016.

This post continues an explanation of Central Limit Theorem started in a previous post, with additional details... and beer.**Central Limit Theorem for Data Science**- Aug 12, 2016.

This post is an introductory explanation of the Central Limit Theorem, and why it is (or should be) of importance to data scientists.**What Statistics Topics are Needed for Excelling at Data Science?**- Aug 2, 2016.

Here is a list of skills and statistical concepts suggested for excelling at data science, roughly in order of increasing complexity.**Plausibility vs. probability, prior distributions, and the garden of forking paths**- Jan 14, 2016.

A discussion on plausibility vs. probability: while many given events may be plausible, but they can’t all be probable.**What is numbersense – test yours**- Mar 25, 2014.

Kaiser Fung, Marketing and Analytics expert, and author of "Numbersense" book, explains what is numbersense in the age of Big Data. Test yours.