- The 5 Most Useful Techniques to Handle Imbalanced Datasets - Jan 22, 2020.
This post is about explaining the various techniques you can use to handle imbalanced datasets.
- KDnuggets™ News 19:n36, Sep 25: The Hidden Risk of AI and Big Data; The 5 Sampling Algorithms every Data Scientist needs to know - Sep 25, 2019.
Learn about unexpected risk of AI applied to Big Data; Study 5 Sampling Algorithms every Data Scientist needs to know; Read how one data scientist copes with his boring days of deploying machine learning; 5 beginner-friendly steps to learn ML with Python; and more.
- The 5 Sampling Algorithms every Data Scientist need to know - Sep 18, 2019.
Algorithms are at the core of data science and sampling is a critical technical that can make or break a project. Learn more about the most common sampling techniques used, so you can select the best approach while working with your data.
- A Gentle Introduction to Noise Contrastive Estimation - Jul 25, 2019.
Find out how to use randomness to learn your data by using Noise Contrastive Estimation with this guide that works through the particulars of its implementation.
- How to correctly select a sample from a huge dataset in machine learning - May 1, 2019.
We explain how choosing a small, representative dataset from a large population can improve model training reliability.
- 4 Myths of Big Data and 4 Ways to Improve with Deep Data - Jan 9, 2019.
There is a fundamental misconception that bigger data produces better machine learning results. However bigger data lakes / warehouses won’t necessarily help to discover more profound insights. It is better to focus on data quality, value and diversity not just size. "Deep Data" is better than Big Data.
- Iterative Initial Centroid Search via Sampling for k-Means Clustering - Sep 12, 2018.
Thinking about ways to find a better set of initial centroid positions is a valid approach to optimizing the k-means clustering process. This post outlines just such an approach.
- What is Normal? - Jul 31, 2018.
I saw an article recently that referred to the normal curve as the data scientist's best friend. We examine myths around the normal curve, including - is most data normally distributed?
- Scalable Select of Random Rows in SQL - Apr 5, 2018.
Performance boosts are achieved by selecting random rows or the sampling technique. Let’s learn how to select random rows in SQL.
- Sampling: A Primer - Aug 8, 2017.
Though it doesn’t get a lot of buzz, sampling is fundamental to any field of science. Marketing scientist Kevin Gray asks Dr. Stas Kolenikov, Senior Scientist at Abt Associates, what marketing researchers and data scientists most need to know about it.
- How to Make Your Database 200x Faster Without Having to Pay More - Nov 22, 2016.
Waiting long for a BI query to execute? I know it’s annoyingly frustrating… It’s a major bottle neck in day-to-day life of a Data Analyst or BI expert. Let’s learn some of the easy to use solutions and a very good explanation of why to use them, along with other advanced technological solutions.
Pages: 1 2 3
- iSight Cloud – Lightning fast visualizations on large data sets - Nov 22, 2016.
SnappyData is launching a FREE cloud service called iSight-Cloud so anyone can try our engine and provide us some feedback. You can try our simple demos in a visual environment or even bring your own data sets to try.
- Learning from Imbalanced Classes - Aug 31, 2016.
Imbalanced classes can cause trouble for classification. Not all hope is lost, however. Check out this article for methods in which to deal with such a situation.
Pages: 1 2
- The Fallacy of Seeing Patterns - Jul 26, 2016.
Analysts are often on the lookout for patterns, often relying on spurious patterns. This post looks at some spurious patterns in univariate, bivariate & multivariate analysis.
- Do You Need Big Data or Smart Data? Part 2 - Jun 2, 2016.
It can be easy to get carried away with the deluge of big data and to rely on its abundance to deliver better models. However, use of data without context and objective could prove counterproductive; contextual and objective driven samples from the large volume and variety of data can be effective tools.
- Do You Need Big Data or Smart Data? Part 1 - Jun 1, 2016.
Analyzing Big Data without paying attention to its characteristics and objective can be detrimental, the fix for which can be correct and effective sampling. Read on to transform your Big Data to Smart Data.
- Commonly Misunderstood Analytics Terms - Sep 3, 2015.
Unable to follow what your analyst language during presentations? Understand what exactly the common terminologies in the data science mean.
- New Hybrid Rare-Event Sampling Technique for Fraud Detection - Apr 26, 2015.
Proposed hybrid sampling methodology may prove useful when building and validating machine learning models for applications where target event is rare, such as fraud detection.