# Clustering (81)

**Introduction to Clustering in Python with PyCaret**- Dec 13, 2021.

A step-by-step, beginner-friendly tutorial for unsupervised clustering tasks in Python using PyCaret.**Clustering in Crowdsourcing: Methodology and Applications**- Nov 30, 2021.

As a result of the efforts outlined in this article, we confirmed that clustering through crowdsourcing is indeed possible and works impressively well.**KDnuggets™ News 21:n40, Oct 20: The 20 Python Packages You Need For Machine Learning and Data Science; Ace Data Science Interviews with Portfolio Projects**- Oct 20, 2021.

The 20 Python Packages You Need For Machine Learning and Data Science; How to Ace Data Science Interview by Working on Portfolio Projects; Deploying Your First Machine Learning API; Real Time Image Segmentation Using 5 Lines of Code; What is Clustering and How Does it Work?**What is Clustering and How Does it Work?**- Oct 14, 2021.

Let us examine how clusters with different properties are produced by different clustering algorithms. In particular, we give an overview of three clustering methods: k-Means clustering, hierarchical clustering, and DBSCAN.**Mastering Clustering with a Segmentation Problem**- Aug 3, 2021.

The one stop shop for implementing the most widely used models in Python for unsupervised clustering.**Key Data Science Algorithms Explained: From k-means to k-medoids clustering**- Dec 29, 2020.

As a core method in the Data Scientist's toolbox, k-means clustering is valuable but can be limited based on the structure of the data. Can expanded methods like PAM (partitioning around medoids), CLARA, and CLARANS provide better solutions, and what is the future of these algorithms?**Clustering Uber Rideshare Data**- Jul 14, 2020.

This blog discusses clustering the Uber ridesharing dataset, with a focus on interpretation and understanding the concepts in the real world.**Machine Learning in Power BI using PyCaret**- May 12, 2020.

Check out this step-by-step tutorial for implementing machine learning in Power BI within minutes.**Getting Started with Spectral Clustering**- May 5, 2020.

This post will unravel a practical example to illustrate and motivate the intuition behind each step of the spectral clustering algorithm.**Understanding Density-based Clustering**- Feb 6, 2020.

HDBSCAN is a robust clustering algorithm that is very useful for data exploration, and this comprehensive introduction provides an overview of its fundamental ideas from a high-level view above the trees to down in the weeds.**Survey Segmentation Tutorial**- Jan 14, 2020.

Learn the basics of verifying segmentation, analyzing the data, and creating segments in this tutorial. When reviewing survey data, you will typically be handed Likert questions (e.g., on a scale of 1 to 5), and by using a few techniques, you can verify the quality of the survey and start grouping respondents into populations.**Customer Segmentation Using K Means Clustering**- Nov 4, 2019.

Customer Segmentation can be a powerful means to identify unsatisfied customer needs. This technique can be used by companies to outperform the competition by developing uniquely appealing products and services.**KDnuggets™ News 19:n38, Oct 9: The Last SQL Guide for Data Analysis; 4 Quadrants of Data Science Skills and 7 steps for Viral Data Visualization**- Oct 9, 2019.

Read a comprehensive SQL guide for data analysis; Learn how to choose the right clustering algorithm for your data; Find out how to create a viral DataViz using the data from Data Science Skills poll; Enroll in any of 10 Free Top Notch Natural Language Processing Courses; and more.**Clustering Metrics Better Than the Elbow Method**- Oct 1, 2019.

We show what metric to use for visualizing and determining an optimal number of clusters much better than the usual practice — elbow method.**What is Hierarchical Clustering?**- Sep 27, 2019.

The article contains a brief introduction to various concepts related to Hierarchical clustering algorithm.**Introduction to Image Segmentation with K-Means clustering**- Aug 9, 2019.

Image segmentation is the classification of an image into different groups. Many kinds of research have been done in the area of image segmentation using clustering. In this article, we will explore using the K-Means clustering algorithm to read an image and cluster different regions of the image.**K-means Clustering with Dask: Image Filters for Cat Pictures**- Jun 18, 2019.

How to recreate an original cat image with least possible colors. An interesting use case of Unsupervised Machine Learning with K Means Clustering in Python.**Who is your Golden Goose?: Cohort Analysis**- May 30, 2019.

Step-by-step tutorial on how to perform customer segmentation using RFM analysis and K-Means clustering in Python.**A complete guide to K-means clustering algorithm**- May 16, 2019.

Clustering - including K-means clustering - is an unsupervised learning technique used for data classification. We provide several examples to help further explain how it works.**Top Data Science and Machine Learning Methods Used in 2018, 2019**- Apr 29, 2019.

Once again, the most used methods are Regression, Clustering, Visualization, Decision Trees/Rules, and Random Forests. The greatest relative increases this year are overwhelmingly Deep Learning techniques, while SVD, SVMs and Association Rules show the greatest decline.**How Machines Make Sense of Big Data: an Introduction to Clustering Algorithms**- Apr 16, 2019.

We outline three different clustering algorithms - k-means clustering, hierarchical clustering and Graph Community Detection - providing an explanation on when to use each, how they work and a worked example.**7 Steps to Mastering Basic Machine Learning with Python — 2019 Edition**- Jan 29, 2019.

With a new year upon us, I thought it would be a good time to revisit the concept and put together a new learning path for mastering machine learning with Python. With these 7 steps you can master basic machine learning with Python!**Synthetic Data Generation: A must-have skill for new data scientists**- Dec 27, 2018.

A brief rundown of methods/packages/ideas to generate synthetic data for self-driven data science projects and deep diving into machine learning methods.**[ebook] Manipulating Data in Apache Spark**- Oct 29, 2018.

In this ebook from Databricks, learn how DataFrames leverage the power of distributed processing through Spark, how to make big data processing easier for a wider audience, and more.**Iterative Initial Centroid Search via Sampling for k-Means Clustering**- Sep 12, 2018.

Thinking about ways to find a better set of initial centroid positions is a valid approach to optimizing the k-means clustering process. This post outlines just such an approach.**An Introduction to t-SNE with Python Example**- Aug 15, 2018.

In this post we’ll give an introduction to the exploratory and visualization t-SNE algorithm. t-SNE is a powerful dimension reduction and visualization technique used on high dimensional data.**Unsupervised Learning Demystified**- Aug 13, 2018.

Unsupervised learning is a pattern-finding technique for mining inspiration from your data. Let's demystify!**K-Means in Real Life: Clustering Workout Sessions**- Aug 3, 2018.

By using the within-cluster sum of squares as cost function, data points in the same cluster will be similar to each other, whereas data points in different clusters will have a lower level of similarity.**Clustering Using K-means Algorithm**- Jul 18, 2018.

This article explains K-means algorithm in an easy way. I’d like to start with an example to understand the objective of this powerful technique in machine learning before getting into the algorithm, which is quite simple.**KDnuggets™ News 18:n25, Jun 27: 5 Clustering Algorithms Data Scientists Need to Know; Detecting Sarcasm with Deep Convolutional Neural Networks?**- Jun 27, 2018.

Also 30 Free Resources for Machine Learning, Deep Learning, NLP ; 7 Simple Data Visualizations You Should Know in R.**The 5 Clustering Algorithms Data Scientists Need to Know**- Jun 20, 2018.

Today, we’re going to look at 5 popular clustering algorithms that data scientists need to know and their pros and cons!**Audience Segmentation**- Jun 6, 2018.

The process of audience segmentation is not about just statistics, it’s about finding your ideal clients and choosing the right way of interaction with them.**Kernel Machine Learning (KernelML) - Generalized Machine Learning Algorithm**- May 18, 2018.

This article introduces a pip Python package called KernelML, created to give analysts and data scientists a generalized machine learning algorithm for complex loss functions and non-linear coefficients.**Ten Machine Learning Algorithms You Should Know to Become a Data Scientist**- Apr 11, 2018.

It's important for data scientists to have a broad range of knowledge, keeping themselves updated with the latest trends. With that being said, we take a look at the top 10 machine learning algorithms every data scientist should know.**Hierarchical Classification – a useful approach for predicting thousands of possible categories**- Mar 12, 2018.

A detailed look at the flat and hierarchical classification approach to dealing with multi-class classification problems.**Topological Data Analysis for Data Professionals: Beyond Ayasdi**- Jan 16, 2018.

We review recent developments and tools in topological data analysis, including applications of persistent homology to psychometrics and a recent extension of piecewise regression, called Morse-Smale regression.**Top Data Science and Machine Learning Methods Used in 2017**- Dec 11, 2017.

The most used methods are Regression, Clustering, Visualization, Decision Trees/Rules, and Random Forests; Deep Learning is used by only 20% of respondents; we also analyze which methods are most "industrial" and most "academic".**3 different types of machine learning**- Nov 1, 2017.

In this extract from “Python Machine Learning” a top data scientist Sebastian Raschka explains 3 main types of machine learning: Supervised, Unsupervised and Reinforcement Learning. Use code PML250KDN to save 50% off the book cost.**Density Based Spatial Clustering of Applications with Noise (DBSCAN)**- Oct 26, 2017.

DBSCAN clustering can identify outliers, observations which won’t belong to any cluster. Since DBSCAN clustering identifies the number of clusters as well, it is very useful with unsupervised learning of the data when we don’t know how many clusters could be there in the data.**Top 10 Machine Learning with R Videos**- Oct 24, 2017.

A complete video guide to Machine Learning in R! This great compilation of tutorials and lectures is an amazing recipe to start developing your own Machine Learning projects.**Tackling Unstructured Data With Text Exploration – On-demand webcast**- Sep 7, 2017.

Discover how to use a platform to organize unstructured data to see the linkages between word usage and document of origin, see the themes in a word cloud, and use topic extraction and document clustering.**Comparing Distance Measurements with Python and SciPy**- Aug 15, 2017.

This post introduces five perfectly valid ways of measuring distances between data points. We will also perform simple demonstration and comparison with Python and the SciPy library.**KDnuggets™ News 17:n26, Jul 12: Applying Deep Learning to Real-world Problems; New Poll: Will society be better from increased automation, AI?**- Jul 12, 2017.

Also Text Clustering: Get quick insights from Unstructured Data; Using the TensorFlow API: An Introductory Tutorial Series; Deep Learning Zero to One: 5 Awe-Inspiring Demos with Code for Beginners, part 2**Text Clustering : Quick insights from Unstructured Data, part 2**- Jul 4, 2017.

We will build this in a modular way and also focus on exposing the functionalities as an API so that it can serve as a plug and play model without any disruptions to the existing systems.**Text Clustering: Get quick insights from Unstructured Data**- Jun 28, 2017.

Grouping and clustering free text is an important advance towards making good use of it. We present an algorithm for unsupervised text clustering approach that enables business to programmatically bin this data.**KDnuggets™ News 17:n24, Jun 21: Learn Data Science skills you need for free; Understanding Deep Learning Requires Re-thinking Generalization**- Jun 21, 2017.

Learn Data Science skills you need for free; Understanding Deep Learning Requires Re-thinking Generalization; K-means Clustering with Tableau - Call Detail Records; The Machine Learning Algorithms Used in Self-Driving Cars.**K-means Clustering with Tableau – Call Detail Records Example**- Jun 16, 2017.

We show how to use Tableau 10 clustering feature to create statistically-based segments that provide insights about similarities in different groups and performance of the groups when compared to each other.**Machine Learning Workflows in Python from Scratch Part 2: k-means Clustering**- Jun 7, 2017.

The second post in this series of tutorials for implementing machine learning workflows in Python from scratch covers implementing the k-means clustering algorithm.**K-means Clustering with R: Call Detail Record Analysis**- Jun 6, 2017.

Call Detail Record (CDR) is the information captured by the telecom companies during Call, SMS, and Internet activity of a customer. This information provides greater insights about the customer’s needs when used with customer demographics.**Must-Know: How to determine the most useful number of clusters?**- May 9, 2017.

Without knowing the ground truth of a dataset, then, how do we know what the optimal number of data clusters are? We will have a look at 2 particular popular methods for attempting to answer this question: the elbow method and the silhouette method.**Toward Increased k-means Clustering Efficiency with the Naive Sharding Centroid Initialization Method**- Mar 13, 2017.

What if a simple, deterministic approach which did not rely on randomization could be used for centroid initialization? Naive sharding is such a method, and its time-saving and efficient results, though preliminary, are promising.**Beginner’s Guide to Customer Segmentation**- Mar 9, 2017.

At the core of customer segmentation is being able to identify different types of customers and then figure out ways to find more of those individuals so you can... you guessed it, get more customers!**K-Means & Other Clustering Algorithms: A Quick Intro with Python**- Mar 8, 2017.

In this intro cluster analysis tutorial, we'll check out a few algorithms in Python so you can get a basic understanding of the fundamentals of clustering on a real dataset.**7 More Steps to Mastering Machine Learning With Python**- Mar 1, 2017.

This post is a follow-up to last year's introductory Python machine learning post, which includes a series of tutorials for extending your knowledge beyond the original.

**KDnuggets™ News 17:n06, Feb 15: So What is Big Data? 52 Useful Machine Learning APIs; Data Science finds Perfect Valentines Dates**- Feb 15, 2017.

Also Making Python Speak SQL with pandasql; 52 Useful Machine Learning & Prediction APIs, updated; New Poll: Do you support Trump Immigration Ban?**Automatically Segmenting Data With Clustering**- Feb 9, 2017.

In this post, we’ll walk through one such algorithm called K-Means Clustering, how to measure its efficacy, and how to choose the sets of segments you generate.**Top KDnuggets tweets, Feb 01-07: Learning to Learn by Gradient Descent by Gradient Descent**- Feb 8, 2017.

Also #DeepLearning Research Review: Natural Language Processing; K-Means, Other Clustering Algorithms: A Quick Intro with #Python; Why #DeepLearning Needs Assembler Hackers.**Quickly tackle unstructured text data**- Feb 8, 2017.

Learn about the new advanced text exploration capabilities available that let you quickly extract insights from text-based data.**Introduction to K-means Clustering: A Tutorial**- Dec 9, 2016.

A beginner introduction to the widely-used K-means clustering algorithm, using a delivery fleet data example in Python.**Introduction to Machine Learning for Developers**- Nov 28, 2016.

Whether you are integrating a recommendation system into your app or building a chat bot, this guide will help you get started in understanding the basics of machine learning.**5 Steps for Advanced Data Analysis using Visualization**- Oct 28, 2016.

In most of the scientific researches, due to large amount of experiment data, statistical analysis is typically done by technical experts in computing and statistics. Unfortunately, these experts are not the experts of underlying research; which may cause gaps in analysis. If actual researchers are given easy to use tools and methods to handle and analyse data, it will enrich the research outcome for sure.**Clustering Key Terms, Explained**- Oct 18, 2016.

Getting started with Data Science or need a refresher? Clustering is among the most used tools of Data Scientists. Check out these 10 Clustering-related terms and their concise definitions.**Comparing Clustering Techniques: A Concise Technical Overview**- Sep 26, 2016.

A wide array of clustering techniques are in use today. Given the widespread use of clustering in everyday data mining, this post provides a concise technical overview of 2 such exemplar techniques.**The Great Algorithm Tutorial Roundup**- Sep 20, 2016.

This is a collection of tutorials relating to the results of the recent KDnuggets algorithms poll. If you are interested in learning or brushing up on the most used algorithms, as per our readers, look here for suggestions on doing so!**Top Algorithms and Methods Used by Data Scientists**- Sep 12, 2016.

Latest KDnuggets poll identifies the list of top algorithms actually used by Data Scientists, finds surprises including the most academic and most industry-oriented algorithms.**Doing the Data Science That Drives Predictive Personalization**- Sep 9, 2016.

Agile collaboration within data science teams is essential to the vision of customer analytics and personalization. Attend IBM DataFirst Launch Event on Sep 27 in New York City to engage with open-source community leaders and practitioners.**MDL Clustering: Unsupervised Attribute Ranking, Discretization, and Clustering**- Aug 26, 2016.

MDL Clustering is a free software suite for unsupervised attribute ranking, discretization, and clustering based on the Minimum Description Length principle and built on the Weka Data Mining platform.**New Poll: Which methods/algorithms you used for a Data Science or Machine Learning application?**- Aug 26, 2016.

Which methods/approaches you used in the past 12 months for an actual Data Science-related application? Please vote and we will analyze and publish the results.**A Tutorial on the Expectation Maximization (EM) Algorithm**- Aug 25, 2016.

This is a short tutorial on the Expectation Maximization algorithm and how it can be used on estimating parameters for multi-variate data.**Top Talks and Tutorials From PyData London**- May 11, 2016.

Get some insight into the most recent Python data science talks and presentations with this eclectic mix of videos from PyData London 2016.**A comparison between PCA and hierarchical clustering**- Feb 23, 2016.

Graphical representations of high-dimensional data sets are the backbone of exploratory data analysis. We examine 2 of the most commonly used methods: heatmaps combined with hierarchical clustering and principal component analysis (PCA).**What questions can data science answer?**- Jan 1, 2016.

There are only five questions machine learning can answer: Is this A or B? Is this weird? How much/how many? How is it organized? What should I do next? We examine these questions in detail and what it implies for data science.**6 crazy things Deep Learning and Topological Data Analysis can do with your data**- Nov 2, 2015.

Want to analyze a high dimensional dataset and you are running out of options? Find out how Deep Learning combined with Topological Data Analysis can do exactly that and more.**Data Mining/Data Science “Nobel Prize”: ACM SIGKDD 2015 Innovation Award to Hans-Peter Kriegel**- Jul 22, 2015.

Prof. Hans-Peter Kriegel wins ACM KDD Innovation Award for his influential research and scientific contributions to data mining in clustering, outlier detection and high-dimensional data analysis, including density-based approaches.**KDnuggets™ News 15:n04, Feb 4: Top Big Data Influencers; A Common Mistake with Time Series; Ayasdi**- Feb 4, 2015.

Top Big Data Influencers and Brands; K-means clustering is not a free lunch; Avoiding a Common Mistake with Time Series; Ayasdi: Managing Data Complexity through Topology; Big Data Could Revolutionize Healthcare.**BigML machine learning platform Winter 2015 Release, Feb 11**- Feb 2, 2015.

See the latest in BigML's continuously evolved machine learning platform with its emphasis on consumability, programmability, and scalability. Feb 11 webinar at 9 am PT and 5 pm PT.**Data Science 102: K-means clustering is not a free lunch**- Jan 29, 2015.

K-means is a widely used method in cluster analysis, but what are its underlying assumptions and drawbacks? We examine what happens for non-spherical data and unevenly sized clusters.**Top /r/MachineLearning posts, Jan 18-24: K-means clustering is not a free lunch; A Deep Dive into Recurrent Neural Nets**- Jan 26, 2015.

Textbook Easter Eggs, issues with k-means, recurrent neural networks, genetic algorithm challenges, and the implementation of machine learning pipelines are all in this week's top /r/MachineLearning posts.**Supermarket customers segmentation using Self-Organizing Mapping**- Oct 23, 2014.

See how a leading European supermarket chain improved customer value and profitability and identified key customer groups by applying business intelligence and analytics techniques like self-organizing maps.**KDnuggets Social Network in NodeXL, May 2014**- May 29, 2014.

We examine KDnuggets Twitter Social Network, as generated by NodeXL, looking at clusters, top Twitter accounts, URLs, hashtags, words, and what does it all mean?**More Data Mining with Weka**- Jan 30, 2014.

This online course teaches both principles and practical data mining techniques, lets students work on very big datasets, classify text, experiment with clustering, and much more.