**How to Become a (Good) Data Scientist – Beginner Guide**- Oct 16, 2019.

A guide covering the things you should learn to become a data scientist, including the basics of business intelligence, statistics, programming, and machine learning.**An Overview of Density Estimation**- Oct 14, 2019.

Density estimation is estimating the probability density function of the population from the sample. This post examines and compares a number of approaches to density estimation.**Statistical Thinking for Industrial Problem Solving: a free online course**- Oct 2, 2019.

This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.**6 bits of advice for Data Scientists**- Sep 25, 2019.

As a data scientist, you can get lost in your daily dives into the data. Consider this advice to be certain to follow in your work for being diligent and more impactful for your organization.**Beta Distribution: What, When & How**- Sep 25, 2019.

This article covers the beta distribution, and explains it using baseball batting averages.**Which Data Science Skills are core and which are hot/emerging ones?**- Sep 17, 2019.

We identify two main groups of Data Science skills: A: 13 core, stable skills that most respondents have and B: a group of hot, emerging skills that most do not have (yet) but want to add. See our detailed analysis.**How Bad is Multicollinearity?**- Sep 17, 2019.

For some people anything below 60% is acceptable and for certain others, even a correlation of 30% to 40% is considered too high because it one variable may just end up exaggerating the performance of the model or completely messing up parameter estimates.**What’s the difference between analytics and statistics?**- Sep 6, 2019.

From asking the best questions about data to answering those questions with certainty, understanding the value of these two seemingly different professions is clarified when you see how they should work together.**Statistical Modelling vs Machine Learning**- Aug 14, 2019.

At times it may seem Machine Learning can be done these days without a sound statistical background but those people are not really understanding the different nuances. Code written to make it easier does not negate the need for an in-depth understanding of the problem.**What is Poisson Distribution?**- Aug 14, 2019.

An solid overview of the Poisson distribution, starting from why it is needed, how it stacks up to binomial distribution, deriving its formula mathematically, and more.**Statistical Thinking for Industrial Problem Solving (STIPS) – a free online course.**- Aug 2, 2019.

This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.**P-values Explained By Data Scientist**- Jul 30, 2019.

This article is designed to give you a full picture from constructing a hypothesis testing to understanding p-value and using that to guide our decision making process.**Annotated Heatmaps of a Correlation Matrix in 5 Simple Steps**- Jul 9, 2019.

A heatmap is a graphical representation of data in which data values are represented as colors. That is, it uses color in order to communicate a value to the reader. This is a great tool to assist the audience towards the areas that matter the most when you have a large volume of data.**How do you check the quality of your regression model in Python?**- Jul 2, 2019.

This week on KDnuggets: 5 Useful Statistics Data Scientists Need to Know; Data Science Jobs Report 2019: Python Way Up, TensorFlow Growing Rapidly, R Use Double SAS; How to Learn Python for Data Science the Right Way; The Machine Learning Puzzle, Explained; Scalable Python Code with Pandas UDFs; and much more!**5 Useful Statistics Data Scientists Need to Know**- Jun 14, 2019.

A data scientist should know how to effectively use statistics to gain insights from data. Here are five useful and practical statistical concepts that every data scientist must know.**All Models Are Wrong – What Does It Mean?**- Jun 12, 2019.

During your adventures in data science, you may have heard “all models are wrong.” Let’s unpack this famous quote to understand how we can still make models that are useful.**Top 10 Statistics Mistakes Made by Data Scientists**- Jun 7, 2019.

The following are some of the most common statistics mistakes made by data scientists. Check this list often to make sure you are not making any of these while applying statistics to data science.**Statistical Thinking for Industrial Problem Solving (STIPS): a free online course.**- Jun 4, 2019.

This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.**Separating signal from noise**- Jun 4, 2019.

When we are building a model, we are making the assumption that our data has two parts, signal and noise. Signal is the real pattern, the repeatable process that we hope to capture and describe. The noise is everything else that gets in the way of that.**What Does a Lady Tasting Tea Have to Do with Science?**- May 31, 2019.

Design of Experiments (DOE) is a statistical concept used to find the cause-and-effect relationships. Surprisingly, an experiment arising from a casual conversation about tea-drinking is one of the first examples of an experiment designed using statistical ideas.**Probability Mass and Density Functions**- May 21, 2019.

This content is part of a series about the chapter 3 on probability from the Deep Learning Book by Goodfellow, I., Bengio, Y., and Courville, A. (2016). It aims to provide intuitions/drawings/python code on mathematical theories and is constructed as my understanding of these concepts.**Modeling 101**- May 13, 2019.

In the past couple of decades, innovation in statistics and machine learning has been increasing at a rapid pace and we are now able to do things unimaginable when I began my career.**Naive Bayes: A Baseline Model for Machine Learning Classification Performance**- May 7, 2019.

We can use Pandas to conduct Bayes Theorem and Scikitlearn to implement the Naive Bayes Algorithm. We take a step by step approach to understand Bayes and implementing the different options in Scikitlearn.**Statistical Thinking for Industrial Problem Solving (STIPS) – a free online course.**- May 3, 2019.

**How to correctly select a sample from a huge dataset in machine learning**- May 1, 2019.

We explain how choosing a small, representative dataset from a large population can improve model training reliability.**Statistical Thinking for Industrial Problem Solving (STIPS) – a free online course**- Apr 5, 2019.

**Spatio-Temporal Statistics: A Primer**- Apr 5, 2019.

The Wake Forest University School of Business is seeking qualified candidates for a Teaching Professor/Professor of the Practice in Statistics/Analytics. This individual will be expected to teach graduate courses in areas such as Data Analysis & Business Modeling, Data Mining & Machine Learning, and Forecasting.**The 7 Myths of Data Anonymisation**- Mar 12, 2019.

Anonymisation has always been rather seen as a necessary evil instead of a helpful tool. That’s why plenty of myths have arisen around that technology over the years.**Beating the Bookies with Machine Learning**- Mar 8, 2019.

We investigate how to use a custom loss function to identify fair odds, including a detailed example using machine learning to bet on the results of a darts match and how this can assist you in beating the bookmaker.**Statistical Thinking for Industrial Problem Solving – a free online course**- Feb 6, 2019.

**From Good to Great Data Science, Part 1: Correlations and Confidence**- Feb 5, 2019.

With the aid of some hospital data, part one describes how just a little inexperience in statistics could result in two common mistakes.**The Essential Data Science Venn Diagram**- Feb 4, 2019.

Southern Illinois University Edwardsville (SIUE) is establishing the Center for Predictive Analytics (C-PAN), and is seeking an innovative, visionary director for the center who will provide centralized leadership in establishing research and educational initiatives across academic units at SIUE.**Introduction to Statistics for Data Science**- Dec 17, 2018.

This tutorial helps explain the central limit theorem, covering populations and samples, sampling distribution, intuition, and contains a useful video so you can continue your learning.**A comprehensive list of Machine Learning Resources: Open Courses, Textbooks, Tutorials, Cheat Sheets and more**- Dec 7, 2018.

A thorough collection of useful resources covering statistics, classic machine learning, deep learning, probability, reinforcement learning, and more.**The 5 Basic Statistics Concepts Data Scientists Need to Know**- Nov 13, 2018.

Today, we’re going to look at 5 basic statistics concepts that data scientists need to know and how they can be applied most effectively!**Quantum Machine Learning: A look at myths, realities, and future projections**- Nov 5, 2018.

An overview of quantum computing and quantum algorithm design, including current state of the hardware and algorithm design within the existing systems.**How I Learned to Stop Worrying and Love Uncertainty**- Oct 24, 2018.

Mindstrong Health is seeking a Sr Data Scientist in Palo Alto, CA, who is passionate about our mission, committed to excellence and excited to build a company that will address one of the greatest health challenges of our time.**Unfolding Naive Bayes From Scratch**- Sep 25, 2018.

Whether you are a beginner in Machine Learning or you have been trying hard to understand the Super Natural Machine Learning Algorithms and you still feel that the dots do not connect somehow, this post is definitely for you!**Machine Learning Cheat Sheets**- Sep 11, 2018.

Check out this collection of machine learning concept cheat sheets based on Stanord CS 229 material, including supervised and unsupervised learning, neural networks, tips & tricks, probability & stats, and algebra & calculus.**5 Things to Know About A/B Testing**- Sep 7, 2018.

This article presents 5 things to know about A/B testing, from appropriate sample sizes, to statistical confidence, to A/B testing usefulness, and more.**Essential Math for Data Science: ‘Why’ and ‘How’**- Sep 6, 2018.

It always pays to know the machinery under the hood (even at a high level) than being just the guy behind the wheel with no knowledge about the car.**What on earth is data science?**- Sep 4, 2018.

An overview and discussion around data science, covering the history behind the term, data mining, statistical inference, machine learning, data engineering and more.**Basic Statistics in Python: Probability**- Aug 21, 2018.

At the most basic level, probability seeks to answer the question, "What is the chance of an event happening?" To calculate the chance of an event happening, we also need to consider all the other events that can occur.**Interpreting a data set, beginning to end**- Aug 20, 2018.

Also: Selecting the Best Machine Learning Algorithm for Your Regression Problem; From Data to Viz: how to select the the right chart for your data; Only Numpy: Implementing GANs and Adam Optimizer using Numpy; Programming Best Practices for Data Science**Basic Statistics in Python: Descriptive Statistics**- Aug 1, 2018.

This article covers defining statistics, descriptive statistics, measures of central tendency, and measures of spread. This article assumes no prior knowledge of statistics, but does require at least a general knowledge of Python.**What is Normal?**- Jul 31, 2018.

I saw an article recently that referred to the normal curve as the data scientist's best friend. We examine myths around the normal curve, including - is most data normally distributed?**Causation in a Nutshell**- Jul 20, 2018.

Every move we make, every breath we take, and every heartbeat is an effect that is caused. Even apparent randomness may just be something we cannot explain.**Explaining the 68-95-99.7 rule for a Normal Distribution**- Jul 19, 2018.

This post explains how those numbers were derived in the hope that they can be more interpretable for your future endeavors.**Why Data Scientists Love Gaussian**- Jun 26, 2018.

Gaussian distribution model, often identified with its iconic bell shaped curve, also referred as Normal distribution, is so popular mainly because of three reasons.**Every time someone runs a correlation coefficient on two time series, an angel loses their wings**- Jun 18, 2018.

We all know correlation doesn’t equal causality at this point, but when working with time series data, correlation can lead you to come to the wrong conclusion.**Statistics, Causality, and What Claims are Difficult to Swallow: Judea Pearl debates Kevin Gray**- Jun 15, 2018.

While KDnuggets takes no side, we present the informative and respectful back and forth as we believe it has value for our readers. We hope that you agree.**A Better Stats 101**- Jun 12, 2018.

Statistics encourages us to think systemically and recognize that variables normally do not operate in isolation, and that an effect usually has multiple causes. Some call this multivariate thinking. Statistics is particularly useful for uncovering the Why.**The Statistics of Gang Violence**- Jun 6, 2018.

For Carlos Carcach, Professor & Director, Center for Public Policy at the Escuela Superior de Economía y Negocios (ESEN) in Santa Tecla, El Salvador, gangs are an object of intellectual curiosity and the subject of his research.**Football World Cup 2018 Predictions: Germany vs Brazil in the final, and more**- Jun 5, 2018.

Looking ahead to the FIFA World Cup that kicks off this month (14th June), we have created the official KDnuggets predictions.**The Book of Why**- Jun 1, 2018.

Judea Pearl has made noteworthy contributions to artificial intelligence, Bayesian networks, and causal analysis. These achievements notwithstanding, Pearl holds some views many statisticians may find odd or exaggerated.**Frequentists Fight Back**- May 24, 2018.

Seeking qualified Ph.D. students or faculty members for the position of Tutor/Instructor to provide one-on-one lectures to the needs of our students in Applied Analytics, Computer Science, Applied Math and Statistics, and more.**Skewness vs Kurtosis – The Robust Duo**- May 4, 2018.

Kurtosis and Skewness are very close relatives of the “data normalized statistical moment” family – Kurtosis being the fourth and Skewness the third moment, and yet they are often used to detect very different phenomena in data. At the same time, it is typically recommendable to analyse the outputs of both together to gather more insight and understand the nature of the data better.**Key Algorithms and Statistical Models for Aspiring Data Scientists**- Apr 16, 2018.

This article provides a summary of key algorithms and statistical techniques commonly used in industry, along with a short resource related to these techniques.**Descriptive Statistics: The Mighty Dwarf of Data Science – Crest Factor**- Apr 6, 2018.

No other mean of data description is more comprehensive than Descriptive Statistics and with the ever increasing volumes of data and the era of low latency decision making needs, its relevance will only continue to increase.**Descriptive Statistics: The Mighty Dwarf of Data Science**- Mar 20, 2018.

No other mean of data description is more comprehensive than Descriptive Statistics and with the ever increasing volumes of data and the era of low latency decision making needs, its relevance will only continue to increase.**Madrid Advanced Statistics and Data Mining Summer School**- Mar 19, 2018.

The courses cover topics such as Neural Networks and Deep Learning, Bayesian Networks, Big Data with Apache Spark, Bayesian Inference, Text Mining and Time Series. Each course has theoretical and practical classes, the latter done with R or Python.**Multiscale Methods and Machine Learning**- Mar 19, 2018.

We highlight recent developments in machine learning and Deep Learning related to multiscale methods, which analyze data at a variety of scales to capture a wider range of relevant features. We give a general overview of multiscale methods, examine recent successes, and compare with similar approaches.**A Few Statistics Tips for Marketers**- Mar 6, 2018.

Statistics can help good marketers become better marketers. Here are a few things they should know about stats.**Histogram 202: Tips and Tricks for Better Data Science**- Feb 15, 2018.

We show how to make an ideal histogram, share some tips, and give examples. Let's dive into the world of binning.**Propensity Score Matching in R**- Jan 18, 2018.

Propensity scores are an alternative method to estimate the effect of receiving treatment when random assignment of treatments to subjects is not feasible.**How Not To Lie With Statistics**- Jan 11, 2018.

Darrell Huff's classic How to Lie with Statistics is perhaps more relevant than ever. In this short article, I revisit this theme from some different angles.**Robust Algorithms for Machine Learning**- Dec 11, 2017.

This post mentions some of the advantages of implementing robust, non-parametric methods into our Machine Learning frameworks and models.**5 Tricks When A/B Testing Is Off The Table**- Dec 8, 2017.

Also The 10 Statistical Techniques Data Scientists Need to Master; Did Spark Really Kill Hadoop? A Framework for Textual Data Science.**You have created your first Linear Regression Model. Have you validated the assumptions?**- Nov 15, 2017.

Linear Regression is an excellent starting point for Machine Learning, but it is a common mistake to focus just on the p-values and R-Squared values while determining validity of model. Here we examine the underlying assumptions of a Linear Regression, which need to be validated before applying the model.**The 10 Statistical Techniques Data Scientists Need to Master**- Nov 15, 2017.

The author presents 10 statistical techniques which a data scientist needs to master. Build up your toolbox of data science tools by having a look at this great overview post.**How Bayesian Networks Are Superior in Understanding Effects of Variables**- Nov 9, 2017.

Bayes Nets have remarkable properties that make them better than many traditional methods in determining variables’ effects. This article explains the principle advantages.**Conjoint Analysis: A Primer**- Nov 1, 2017.

Conjoint is another of those things everyone talks about but many are confused about…**Monty Hall chooses the final exit door**- Oct 7, 2017.

Monty Hall, the game show host, died last week. He was the host of the popular show "Let's Make a Deal", where contestants try to guess which one of 3 doors hides a valuable prize.**Statistical Mistakes Even Scientists Make**- Oct 3, 2017.

Scientists are all experts in statistics, right? Wrong.**30 Essential Data Science, Machine Learning & Deep Learning Cheat Sheets**- Sep 22, 2017.

This collection of data science cheat sheets is not a cheat sheet dump, but a curated list of reference materials spanning a number of disciplines and tools.**How To Lie With Numbers**- Sep 21, 2017.

It takes less effort to lie without numbers, but there are now more numbers and more ways to lie with them than ever before. Poor Reverend Bayes, who understood the true meaning of "evidence".**Vital Statistics You Never Learned… Because They’re Never Taught**- Aug 29, 2017.

Marketing scientist Kevin Gray asks Professor Frank Harrell about some important things we often get wrong about statistics.**Machine Learning vs. Statistics: The Texas Death Match of Data Science**- Aug 23, 2017.

Throughout its history, Machine Learning (ML) has coexisted with Statistics uneasily, like an ex-boyfriend accidentally seated with the groom’s family at a wedding reception: both uncertain where to lead the conversation, but painfully aware of the potential for awkwardness.**Data Science Primer: Basic Concepts for Beginners**- Aug 11, 2017.

This collection of concise introductory data science tutorials cover topics including the difference between data mining and statistics, supervised vs. unsupervised learning, and the types of patterns we can mine from data.**Analytically Speaking Featuring Pedro Saraiva, July 12**- Jul 7, 2017.

Former academician and now Portugal MP Pedro Saraiva says that Parliaments and societies will improve if more people with a good statistical background become MP. Learn about the paradoxes and issues in statistics and politics.**Who Cares About Evidence?**- Jun 29, 2017.

Why bother with evidence? Because it improves the odds that what we believe is actually true. But not always.**Is Regression Analysis Really Machine Learning?**- Jun 5, 2017.

What separates "traditional" applied statistics from machine learning? Is statistics the foundation on top of which machine learning is built? Is machine learning a superset of "traditional" statistics? Do these 2 concepts have a third unifying concept in common? So, in that vein... is regression analysis actually a form of machine learning?**Descriptive Statistics Key Terms, Explained**- May 18, 2017.

This is a collection of 15 basic descriptive statistics key terms, explained in easy to understand language, along with an example and some Python code for computing simple descriptive statistics.**Propensity Scores: A Primer**- May 16, 2017.

Propensity scores are used in quasi-experimental and non-experimental research when the researcher must make causal inferences, for example, that exposure to a chemical increases the risk of cancer.**Madrid UPM Advanced Statistics and Data Mining Summer School, June 26 – July 7**- May 12, 2017.

The courses cover topics such as Neural Networks and Deep Learning, Bayesian Networks, Big Data with Apache Spark, Bayesian Inference, Text Mining and Time Series, and each has theoretical as well as practical classes, done with R or Python. Early bird till June 5.**Analytically Speaking Featuring Melisa Buie – On Demand**- Apr 6, 2017.

Learn how to keep your audience from struggling to understand your work, why others should review your experimentation process, how to build your experimental muscle, and more.**Stuff Happens: A Statistical Guide to the “Impossible”**- Apr 6, 2017.

Why are some people struck by lightning multiple times or, more encouragingly, how could anyone possibly win the lottery more than once? The odds against these sorts of things are enormous.**How to think like a data scientist to become one**- Mar 23, 2017.

The author went from securities analyst to Head of Data Science at Amazon. He describes what he learned in his journey and gives 4 useful rules based on his experience.**What Top Firms Ask: 100+ Data Science Interview Questions**- Mar 22, 2017.

Check this out: A topic wise collection of 100+ data science interview questions from top companies.**Why A/B Testers Have The Best Jobs In Tech**- Mar 22, 2017.

Learning about what these people do made it clear that when you are deeply involved in A/B testing at scale, there is a tremendous rush from doing so many different things that matter.**Analytics 101: Comparing KPIs**- Mar 20, 2017.

Different business units in the organisation have different behaviours (e.g. turnover rate) and they can’t be compared with each other. So, how can we tell whether the changes in their behaviour are reasons for concern?**17 More Must-Know Data Science Interview Questions and Answers, Part 3**- Mar 15, 2017.

The third and final part of 17 new must-know Data Science interview questions and answers covers A/B testing, data visualization, Twitter influence evaluation, and Big Data quality.

**Get more insights from fewer experiments**- Mar 3, 2017.

Efficient experimentation can save both time and money in the long term when it helps optimize product or process performance. This webcast shows how a dynamic model can dramatically improve outcomes.**Introduction to Correlation**- Feb 22, 2017.

Correlation is one of the most widely used (and widely misunderstood) statistical concepts. We provide the definitions and intuition behind several types of correlation and illustrate how to calculate correlation using the Python pandas library.**Causation or Correlation: Explaining Hill Criteria using xkcd**- Feb 20, 2017.

This is an attempt to explain Hill’s criteria using xkcd comics, both because it seemed fun, and also to motivate causal inference instructures to have some variety in which xkcd comic they include in lectures.**Removing Outliers Using Standard Deviation in Python**- Feb 16, 2017.

Standard Deviation is one of the most underrated statistical tools out there. It’s an extremely useful metric that most people know how to calculate but very few know how to use effectively.**The Top Predictive Analytics Pitfalls to Avoid**- Jan 23, 2017.

Predictive modelling and machine learning are significantly contributing to business, but they can be very sensitive to data and changes in it, which makes it very important to use proper techniques and avoid pitfalls in building data science models.**A Non-comprehensive List of Awesome Things Other People Did in 2016**- Jan 10, 2017.

A top statistics professor and statistical researcher reflects on a number of awesome accomplishments by individuals in, and related to, the fields of statistics and data science, with a focus on the world of academia but with resonance far beyond.**3 methods to deal with outliers**- Jan 3, 2017.

In both statistics and machine learning, outlier detection is important for building an accurate model to get good results. Here three methods are discussed to detect outliers or anomalous data instances.**Top KDnuggets tweets, Dec 14-20: False positives versus false negatives: Best explanation ever**- Dec 21, 2016.

Also #MachineLearning, #AI experts: Main Developments 2016, Key Trends 2017; Official code repository for #MachineLearning with #TensorFlow book; Top 10 Essential Books for the #Data Enthusiast.**Machine Learning vs Statistics**- Nov 29, 2016.

Machine learning is all about predictions, supervised learning, and unsupervised learning, while statistics is about sample, population, and hypotheses. But are they actually that different?**How Bayesian Inference Works**- Nov 15, 2016.

Bayesian inference isn’t magic or mystical; the concepts behind it are completely accessible. In brief, Bayesian inference lets you draw stronger conclusions from your data by folding in what you already know about the answer. Read an in-depth overview here.**Trump, The Statistics of Polling, and Forecasting Home Prices**- Nov 12, 2016.

Why polling has failed in US Presidential election? The home price index offers an apt comparison inasmuch as sample selection is problematic, equally snagging both election predictions and home price futures.**How Can Lean Six Sigma Help Machine Learning?**- Nov 1, 2016.

The data cleansing phase alone is not sufficient to ensure the accuracy of the machine learning, when noise / bias exists in input data. The lean six sigma variance reduction can improve the accuracy of machine learning results.**Data Science Basics: Data Mining vs. Statistics**- Sep 28, 2016.

As a beginner I was confused at the relationship between data mining and statistics. This is my attempt to help straighten out this connection for others who may now be in my old shoes.**The Great Algorithm Tutorial Roundup**- Sep 20, 2016.

This is a collection of tutorials relating to the results of the recent KDnuggets algorithms poll. If you are interested in learning or brushing up on the most used algorithms, as per our readers, look here for suggestions on doing so!**How top companies use data to make wise decisions**- Sep 15, 2016.

Seeking a Research Scientist who will employ skills and experience to improve, create and innovate data-driven modeling approaches for our price and promotion solutions, while anticipating and charting future research needs.**A Tutorial on the Expectation Maximization (EM) Algorithm**- Aug 25, 2016.

This is a short tutorial on the Expectation Maximization algorithm and how it can be used on estimating parameters for multi-variate data.**Central Limit Theorem for Data Science – Part 2**- Aug 16, 2016.

This post continues an explanation of Central Limit Theorem started in a previous post, with additional details... and beer.**Central Limit Theorem for Data Science**- Aug 12, 2016.

This post is an introductory explanation of the Central Limit Theorem, and why it is (or should be) of importance to data scientists.**Understanding the Empirical Law of Large Numbers and the Gambler’s Fallacy**- Aug 12, 2016.

Law of large numbers is a important concept for practising data scientists. In this post, The empirical law of large numbers is demonstrated via simple simulation approach using the Bernoulli process.**Understand customer needs with choice modeling**- Aug 3, 2016.

Data Science Statistics 101; 7 Steps to Understanding NoSQL Databases; The Core of Data Science; Data Science for Beginners 2: Is your data ready?**What Statistics Topics are Needed for Excelling at Data Science?**- Aug 2, 2016.

Here is a list of skills and statistical concepts suggested for excelling at data science, roughly in order of increasing complexity.**Doing Statistics with SQL**- Aug 2, 2016.

This post covers how to perform some basic in-database statistical analysis using SQL.**Data Science Statistics 101**- Jul 28, 2016.

Statistics can often be the most intimidating aspect of data science for aspiring data scientists to learn. Gain some personal perspective from someone who has traveled the path.**Why Big Data is in Trouble: They Forgot About Applied Statistics**- Jul 18, 2016.

This "classic" (but very topical and certainly relevant) post discusses issues that Big Data can face when it forgets, or ignores, applied statistics. As great of a discussion today as it was 2 years ago.**Big Data, Bible Codes, and Bonferroni**- Jul 8, 2016.

This discussion will focus on 2 particular statistical issues to be on the look out for in your own work and in the work of others mining and learning from Big Data, with real world examples emphasizing the importance of statistical processes in practice.**How to Compare Apples and Oranges ? : Part III**- Jul 6, 2016.

In the previous article, look at techniques to compare categorical variables with the help of an example. In this article, we shall look at techniques to compare mixed type of variables i.e. numerical and categorical variables together.