- How to do “Limitless” Math in Python - Oct 7, 2021.
How to perform arbitrary-precision computation and much more math (and fast too) than what is possible with the built-in math library in Python.
- How to Determine the Best Fitting Data Distribution Using Python - Sep 30, 2021.
Approaches to data sampling, modeling, and analysis can vary based on the distribution of your data, and so determining the best fit theoretical distribution can be an essential step in your data exploration process.
- Advanced Statistical Concepts in Data Science - Sep 30, 2021.
The article contains some of the most commonly used advanced statistical concepts along with their Python implementation.
- Important Statistics Data Scientists Need to Know - Sep 29, 2021.
Several fundamental statistical concepts must be well appreciated by every data scientist -- from the enthusiast to the professional. Here, we provide code snippets in Python to increase understanding to bring you key tools that bring early insight into your data.
- 11 Important Probability Distributions Explained - Jul 20, 2021.
There are many distribution functions considered in statistics and machine learning, which can seem daunting to understand at first. Many are actually closely related, and with these intuitive explanations of the most important probability distributions, you can begin to appreciate the observations of data these distributions communicate.
- The 7 Best Open Source AI Libraries You May Not Have Heard Of - Jun 9, 2021.
AI researchers today have many exciting options for working with specialized tools. Although starting original projects from scratch is often not necessary, knowing which existing library to leverage remains a challenge. This list of generally unknown yet awesome, open-source libraries offers an interesting collection to consider for state-of-the-art research that spans from automatic machine learning to differentiable quantum circuits.
- Rejection Sampling with Python - Mar 24, 2021.
Read this article on rejection sampling with examples using the Normal and Cauchy Distributions.
- More Data Science Cheatsheets - Mar 18, 2021.
It's time again to look at some data science cheatsheets. Here you can find a short selection of such resources which can cater to different existing levels of knowledge and breadth of topics of interest.
- Fast and Intuitive Statistical Modeling with Pomegranate - Dec 21, 2020.
Pomegranate is a delicious fruit. It can also be a super useful Python library for statistical analysis. We will show how in this article.
- Essential Math for Data Science: Probability Density and Probability Mass Functions - Dec 7, 2020.
In this article, we’ll cover probability mass and probability density function in this sample. You’ll see how to understand and represent these distribution functions and their link with histograms.
- The Best Free Data Science eBooks: 2020 Update - Sep 30, 2020.
The author has updated their list of best free data science books for 2020. Read on to see what books you should grab.
- Before Probability Distributions - Jul 16, 2020.
Why do we use probability distributions, and why do they matter?
- The 8 Basic Statistics Concepts for Data Science - Jun 24, 2020.
Understanding the fundamentals of statistics is a core capability for becoming a Data Scientist. Review these essential ideas that will be pervasive in your work and raise your expertise in the field.
- 4 Free Math Courses to do and Level up your Data Science Skills - Jun 22, 2020.
Just as there is no Data Science without data, there's no science in data without mathematics. Strengthening your foundational skills in math will level you up as a data scientist that will enable you to perform with greater expertise.
- Overview of data distributions - Jun 10, 2020.
With so many types of data distributions to consider in data science, how do you choose the right one to model your data? This guide will overview the most important distributions you should be familiar with in your work.
- Looking Normal(ly Distributed) - May 20, 2020.
This article investigates when some probability distributions look normal "enough" for a statistical test.
- KDnuggets™ News 20:n09, Mar 4: When Will AutoML replace Data Scientists (if ever) – vote; 20 AI, DS, ML Terms You Need to Know (part 2) - Mar 4, 2020.
- Linear to Logistic Regression, Explained Step by Step - Mar 3, 2020.
Logistic Regression is a core supervised learning technique for solving classification problems. This article goes beyond its simple code to first understand the concepts behind the approach, and how it all emerges from the more basic technique of Linear Regression.
- Data Science Curriculum for self-study - Feb 26, 2020.
Are you asking the question, "how do I become a Data Scientist?" This list recommends the best essential topics to gain an introductory understanding for getting started in the field. After learning these basics, keep in mind that doing real data science projects through internships or competitions is crucial to acquiring the core skills necessary for the job.
- Probability Distributions in Data Science - Feb 26, 2020.
Some machine learning models are designed to work best under some distribution assumptions. Therefore, knowing with which distributions we are working with can help us to identify which models are best to use.
- Optimal Estimation Algorithms: Kalman and Particle Filters - Feb 5, 2020.
An introduction to the Kalman and Particle Filters and their applications in fields such as Robotics and Reinforcement Learning.
- Uber Has Been Quietly Assembling One of the Most Impressive Open Source Deep Learning Stacks in the Market - Jan 27, 2020.
Many of the technologies used by Uber teams have been open sourced and received accolades from the machine learning community. Let’s look at some of my favorites.
- Probability Learning: Naive Bayes - Nov 26, 2019.
This post will describe various simplifications of Bayes' Theorem, that make it more practical and applicable to real world problems: these simplifications are known by the name of Naive Bayes. Also, to clarify everything we will see a very illustrative example of how Naive Bayes can be applied for classification.
- The Math Behind Bayes - Nov 19, 2019.
This post will be dedicated to explaining the maths behind Bayes Theorem, when its application makes sense, and its differences with Maximum Likelihood.
- Probability Learning: Maximum Likelihood - Nov 5, 2019.
The maths behind Bayes will be better understood if we first cover the theory and maths underlying another fundamental method of probabilistic machine learning: Maximum Likelihood. This post will be dedicated to explaining it.
- How Bayes’ Theorem is Applied in Machine Learning - Oct 28, 2019.
Learn how Bayes Theorem is in Machine Learning for classification and regression!
- Probability Learning: Bayes’ Theorem - Oct 16, 2019.
Learn about one of the fundamental theorems of probability with an easy everyday example.
- An Overview of Density Estimation - Oct 14, 2019.
Density estimation is estimating the probability density function of the population from the sample. This post examines and compares a number of approaches to density estimation.
- Beta Distribution: What, When & How - Sep 25, 2019.
This article covers the beta distribution, and explains it using baseball batting averages.
- How to count Big Data: Probabilistic data structures and algorithms - Aug 26, 2019.
Learn how probabilistic data structures and algorithms can be used for cardinality estimation in Big Data streams.
- What is Poisson Distribution? - Aug 14, 2019.
An solid overview of the Poisson distribution, starting from why it is needed, how it stacks up to binomial distribution, deriving its formula mathematically, and more.
- KDnuggets™ News 19:n25, Jul 10: 5 Probability Distributions for Data Scientists; What the Machine Learning Engineer Job is Really Like - Jul 10, 2019.
This edition of the KDnuggets newsletter is double-sized after taking the holiday week off. Learn about probability distributions every data scientist should know, what the machine learning engineering job is like, making the most money with the least amount of risk, the difference between NLP and NLU, get a take on Nvidia's new data science workstation, and much, much more.
- 5 Probability Distributions Every Data Scientist Should Know - Jul 4, 2019.
Having an understanding of probability distributions should be a priority for data scientists. Make sure you know what you should by reviewing this post on the subject.
- Probability Mass and Density Functions - May 21, 2019.
This content is part of a series about the chapter 3 on probability from the Deep Learning Book by Goodfellow, I., Bengio, Y., and Courville, A. (2016). It aims to provide intuitions/drawings/python code on mathematical theories and is constructed as my understanding of these concepts.
Pages: 1 2
- Unfolding Naive Bayes From Scratch - Sep 25, 2018.
Whether you are a beginner in Machine Learning or you have been trying hard to understand the Super Natural Machine Learning Algorithms and you still feel that the dots do not connect somehow, this post is definitely for you!
Pages: 1 2
- Machine Learning Cheat Sheets - Sep 11, 2018.
Check out this collection of machine learning concept cheat sheets based on Stanord CS 229 material, including supervised and unsupervised learning, neural networks, tips & tricks, probability & stats, and algebra & calculus.
- Basic Statistics in Python: Probability - Aug 21, 2018.
At the most basic level, probability seeks to answer the question, "What is the chance of an event happening?" To calculate the chance of an event happening, we also need to consider all the other events that can occur.
- Why Data Scientists Love Gaussian - Jun 26, 2018.
Gaussian distribution model, often identified with its iconic bell shaped curve, also referred as Normal distribution, is so popular mainly because of three reasons.
- How Bayesian Networks Are Superior in Understanding Effects of Variables - Nov 9, 2017.
Bayes Nets have remarkable properties that make them better than many traditional methods in determining variables’ effects. This article explains the principle advantages.
- 30 Essential Data Science, Machine Learning & Deep Learning Cheat Sheets - Sep 22, 2017.
This collection of data science cheat sheets is not a cheat sheet dump, but a curated list of reference materials spanning a number of disciplines and tools.
Pages: 1 2 3
- The Surprising Complexity of Randomness - Jun 15, 2017.
The reason we have pseudorandom numbers is because generating true random numbers using a computer is difficult. Computers, by design, are excellent at taking a set of instructions and carrying them out in the exact same way, every single time.
- Stuff Happens: A Statistical Guide to the “Impossible” - Apr 6, 2017.
Why are some people struck by lightning multiple times or, more encouragingly, how could anyone possibly win the lottery more than once? The odds against these sorts of things are enormous.
- Introduction to Bayesian Inference - Dec 16, 2016.
Bayesian inference is a powerful toolbox for modeling uncertainty, combining researcher understanding of a problem with data, and providing a quantitative measure of how plausible various facts are. This overview from Datascience.com introduces Bayesian probability and inference in an intuitive way, and provides examples in Python to help get you started.
- What Statistics Topics are Needed for Excelling at Data Science? - Aug 2, 2016.
Here is a list of skills and statistical concepts suggested for excelling at data science, roughly in order of increasing complexity.
- Big Data, Bible Codes, and Bonferroni - Jul 8, 2016.
This discussion will focus on 2 particular statistical issues to be on the look out for in your own work and in the work of others mining and learning from Big Data, with real world examples emphasizing the importance of statistical processes in practice.
- Deep Learning, Pachinko, and James Watt: Efficiency is the Driver of Uncertainty - Jun 8, 2016.
A reasoned discussion of why the next generation of data efficient learning approaches rely on us developing new algorithms that can propagate stochasticity or uncertainty right through the model, and which are mathematically more involved than the standard approaches.
Pages: 1 2
- Do You Need Big Data or Smart Data? Part 2 - Jun 2, 2016.
It can be easy to get carried away with the deluge of big data and to rely on its abundance to deliver better models. However, use of data without context and objective could prove counterproductive; contextual and objective driven samples from the large volume and variety of data can be effective tools.
- Do You Need Big Data or Smart Data? Part 1 - Jun 1, 2016.
Analyzing Big Data without paying attention to its characteristics and objective can be detrimental, the fix for which can be correct and effective sampling. Read on to transform your Big Data to Smart Data.
- Bayes Theorem for Computer Scientists, Explained - Feb 16, 2016.
Data science is vain without the solid understanding of probability and statistics. Learn the basic concepts of probability, including law of total probability, relevant theorem and Bayes’ theorem, along with their computer science applications.
Pages: 1 2
- Plausibility vs. probability, prior distributions, and the garden of forking paths - Jan 14, 2016.
A discussion on plausibility vs. probability: while many given events may be plausible, but they can’t all be probable.
- Top /r/MachineLearning Posts, Apr 5-11: Amazon Machine Learning, Numerical Optimization, and Conditional Random Fields - Apr 14, 2015.
Amazon Machine Learning as a Service, Numerical Optimization, Extracting data from NYTimes recipes, Intro to Machine Learning with sci-kit, and more.
- INFORMS The Business of Big Data 2014: Day 1 Highlights - Aug 21, 2014.
Highlights from the presentations by Big Data technology practitioners from Teradata, Booz Allen Hamilton, Databricks and ProbabilityManagement.org during INFORMS The Business of Big Data in San Jose.