- KDnuggets™ News 21:n03, Jan 20: K-Means 8x faster, 27x lower error than Scikit-learn in 25 lines; Essential Math for Data Science: Information Theory - Jan 20, 2021.
Here is a clever method of getting K-Means 8x faster, 27x lower error than Scikit-learn; Understand information theory you need for Data Science; Learn how to do cleaner data analysis with pandas using pipes; What are the four jobs of the data scientist? and more
- K-Means 8x faster, 27x lower error than Scikit-learn in 25 lines - Jan 15, 2021.
K-means clustering is a powerful algorithm for similarity searches, and Facebook AI Research's faiss library is turning out to be a speed champion. With only a handful of lines of code shared in this demonstration, faiss outperforms the implementation in scikit-learn in speed and accuracy.
- Key Data Science Algorithms Explained: From k-means to k-medoids clustering - Dec 29, 2020.
As a core method in the Data Scientist's toolbox, k-means clustering is valuable but can be limited based on the structure of the data. Can expanded methods like PAM (partitioning around medoids), CLARA, and CLARANS provide better solutions, and what is the future of these algorithms?
- Top KDnuggets tweets, Dec 2-8: How to do visualization using #Python from scratch - Dec 9, 2020.
K-Means 8x faster, 27x lower error than Scikit-learn's in 25 lines; How to do visualization using #Python from scratch; Why the Future of ETL Is Not ELT, But EL(T); NoSQL for Beginners
- KDnuggets™ News 20:n24, Jun 17: Easy Speech-to-Text with Python; Data Distributions Overview; Java for Data Scientists - Jun 17, 2020.
Also: Deploy a Machine Learning Pipeline to the Cloud Using a Docker Container; Five Cognitive Biases In Data Science (And how to avoid them); Understanding Machine Learning: The Free eBook; Simplified Mixed Feature Type Preprocessing in Scikit-Learn with Pipelines; A Complete guide to Google Colab for Deep Learning
- Centroid Initialization Methods for k-means Clustering - Jun 10, 2020.
This article is the first in a series of articles looking at the different aspects of k-means clustering, beginning with a discussion on centroid initialization.
- 5 Great New Features in Scikit-learn 0.23 - May 15, 2020.
Check out 5 new features of the latest Scikit-learn release, including the ability to visualize estimators in notebooks, improvements to both k-means and gradient boosting, some new linear model implementations, and sample weight support for a pair of existing regressors.
- Machine Learning in Power BI using PyCaret - May 12, 2020.
Check out this step-by-step tutorial for implementing machine learning in Power BI within minutes.
- Understanding Density-based Clustering - Feb 6, 2020.
HDBSCAN is a robust clustering algorithm that is very useful for data exploration, and this comprehensive introduction provides an overview of its fundamental ideas from a high-level view above the trees to down in the weeds.
- Customer Segmentation Using K Means Clustering - Nov 4, 2019.
Customer Segmentation can be a powerful means to identify unsatisfied customer needs. This technique can be used by companies to outperform the competition by developing uniquely appealing products and services.
- Introduction to Image Segmentation with K-Means clustering - Aug 9, 2019.
Image segmentation is the classification of an image into different groups. Many kinds of research have been done in the area of image segmentation using clustering. In this article, we will explore using the K-Means clustering algorithm to read an image and cluster different regions of the image.
- K-means Clustering with Dask: Image Filters for Cat Pictures - Jun 18, 2019.
How to recreate an original cat image with least possible colors. An interesting use case of Unsupervised Machine Learning with K Means Clustering in Python.
- Who is your Golden Goose?: Cohort Analysis - May 30, 2019.
Step-by-step tutorial on how to perform customer segmentation using RFM analysis and K-Means clustering in Python.
Pages: 1 2
- KDnuggets™ News 19:n20, May 22: 7 Steps to Mastering SQL for Data Science; How to build Math Programming Skills - May 22, 2019.
Also An overview of Pycharm for Data Scientists; How to build a Computer Vision model - key approaches and datasets; k-means clustering tutorial; 60+ useful graph visualization libraries; The Data Fabric for Machine Learning.
- A complete guide to K-means clustering algorithm - May 16, 2019.
Clustering - including K-means clustering - is an unsupervised learning technique used for data classification. We provide several examples to help further explain how it works.
- Iterative Initial Centroid Search via Sampling for k-Means Clustering - Sep 12, 2018.
Thinking about ways to find a better set of initial centroid positions is a valid approach to optimizing the k-means clustering process. This post outlines just such an approach.
- K-Means in Real Life: Clustering Workout Sessions - Aug 3, 2018.
By using the within-cluster sum of squares as cost function, data points in the same cluster will be similar to each other, whereas data points in different clusters will have a lower level of similarity.
- Clustering Using K-means Algorithm - Jul 18, 2018.
This article explains K-means algorithm in an easy way. I’d like to start with an example to understand the objective of this powerful technique in machine learning before getting into the algorithm, which is quite simple.
- Top 10 Machine Learning Algorithms for Beginners - Oct 20, 2017.
A beginner's introduction to the Top 10 Machine Learning (ML) algorithms, complete with figures and examples for easy understanding.
Pages: 1 2
- Comparing Distance Measurements with Python and SciPy - Aug 15, 2017.
This post introduces five perfectly valid ways of measuring distances between data points. We will also perform simple demonstration and comparison with Python and the SciPy library.
- K-means Clustering with Tableau – Call Detail Records Example - Jun 16, 2017.
We show how to use Tableau 10 clustering feature to create statistically-based segments that provide insights about similarities in different groups and performance of the groups when compared to each other.
Pages: 1 2
- Machine Learning Workflows in Python from Scratch Part 2: k-means Clustering - Jun 7, 2017.
The second post in this series of tutorials for implementing machine learning workflows in Python from scratch covers implementing the k-means clustering algorithm.
- K-means Clustering with R: Call Detail Record Analysis - Jun 6, 2017.
Call Detail Record (CDR) is the information captured by the telecom companies during Call, SMS, and Internet activity of a customer. This information provides greater insights about the customer’s needs when used with customer demographics.
- KDnuggets™ News 17:n10, Mar 15: Becoming a Data Science Unicorn; What Makes a Good Data Visualization? - Mar 15, 2017.
6 Business Concepts you need to become a Data Science Unicorn; What Makes a Good Data Visualization?; Best Data Science Courses from Udemy (only $19 till Mar 31); K-Means & Other Clustering Algorithms: A Quick Intro with Python; Free Online Data Science & Big Data Books
- Toward Increased k-means Clustering Efficiency with the Naive Sharding Centroid Initialization Method - Mar 13, 2017.
What if a simple, deterministic approach which did not rely on randomization could be used for centroid initialization? Naive sharding is such a method, and its time-saving and efficient results, though preliminary, are promising.
- Beginner’s Guide to Customer Segmentation - Mar 9, 2017.
At the core of customer segmentation is being able to identify different types of customers and then figure out ways to find more of those individuals so you can... you guessed it, get more customers!
- K-Means & Other Clustering Algorithms: A Quick Intro with Python - Mar 8, 2017.
In this intro cluster analysis tutorial, we'll check out a few algorithms in Python so you can get a basic understanding of the fundamentals of clustering on a real dataset.
- Automatically Segmenting Data With Clustering - Feb 9, 2017.
In this post, we’ll walk through one such algorithm called K-Means Clustering, how to measure its efficacy, and how to choose the sets of segments you generate.
- Top KDnuggets tweets, Feb 01-07: Learning to Learn by Gradient Descent by Gradient Descent - Feb 8, 2017.
Also #DeepLearning Research Review: Natural Language Processing; K-Means, Other Clustering Algorithms: A Quick Intro with #Python; Why #DeepLearning Needs Assembler Hackers.
- Introduction to K-means Clustering: A Tutorial - Dec 9, 2016.
A beginner introduction to the widely-used K-means clustering algorithm, using a delivery fleet data example in Python.
- Clustering Key Terms, Explained - Oct 18, 2016.
Getting started with Data Science or need a refresher? Clustering is among the most used tools of Data Scientists. Check out these 10 Clustering-related terms and their concise definitions.
- Comparing Clustering Techniques: A Concise Technical Overview - Sep 26, 2016.
A wide array of clustering techniques are in use today. Given the widespread use of clustering in everyday data mining, this post provides a concise technical overview of 2 such exemplar techniques.
- Top 10 Data Mining Algorithms, Explained - May 21, 2015.
Top 10 data mining algorithms, selected by top researchers, are explained here, including what do they do, the intuition behind the algorithm, available implementations of the algorithms, why use them, and interesting applications.
Pages: 1 2 3
- Data Science 102: K-means clustering is not a free lunch - Jan 29, 2015.
K-means is a widely used method in cluster analysis, but what are its underlying assumptions and drawbacks? We examine what happens for non-spherical data and unevenly sized clusters.