# Tag: Clustering

**Must-Know: How to determine the most useful number of clusters?**- May 9, 2017.

Without knowing the ground truth of a dataset, then, how do we know what the optimal number of data clusters are? We will have a look at 2 particular popular methods for attempting to answer this question: the elbow method and the silhouette method.**Toward Increased k-means Clustering Efficiency with the Naive Sharding Centroid Initialization Method**- Mar 13, 2017.

What if a simple, deterministic approach which did not rely on randomization could be used for centroid initialization? Naive sharding is such a method, and its time-saving and efficient results, though preliminary, are promising.**Beginner’s Guide to Customer Segmentation**- Mar 9, 2017.

At the core of customer segmentation is being able to identify different types of customers and then figure out ways to find more of those individuals so you can... you guessed it, get more customers!**K-Means & Other Clustering Algorithms: A Quick Intro with Python**- Mar 8, 2017.

In this intro cluster analysis tutorial, we'll check out a few algorithms in Python so you can get a basic understanding of the fundamentals of clustering on a real dataset.**7 More Steps to Mastering Machine Learning With Python**- Mar 1, 2017.

This post is a follow-up to last year's introductory Python machine learning post, which includes a series of tutorials for extending your knowledge beyond the original.

**KDnuggets™ News 17:n06, Feb 15: So What is Big Data? 52 Useful Machine Learning APIs; Data Science finds Perfect Valentines Dates**- Feb 15, 2017.

Also Making Python Speak SQL with pandasql; 52 Useful Machine Learning & Prediction APIs, updated; New Poll: Do you support Trump Immigration Ban?**Automatically Segmenting Data With Clustering**- Feb 9, 2017.

In this post, we’ll walk through one such algorithm called K-Means Clustering, how to measure its efficacy, and how to choose the sets of segments you generate.**Top KDnuggets tweets, Feb 01-07: Learning to Learn by Gradient Descent by Gradient Descent**- Feb 8, 2017.

Also #DeepLearning Research Review: Natural Language Processing; K-Means, Other Clustering Algorithms: A Quick Intro with #Python; Why #DeepLearning Needs Assembler Hackers.**Quickly tackle unstructured text data**- Feb 8, 2017.

Learn about the new advanced text exploration capabilities available that let you quickly extract insights from text-based data.**Introduction to K-means Clustering: A Tutorial**- Dec 9, 2016.

A beginner introduction to the widely-used K-means clustering algorithm, using a delivery fleet data example in Python.**Introduction to Machine Learning for Developers**- Nov 28, 2016.

Whether you are integrating a recommendation system into your app or building a chat bot, this guide will help you get started in understanding the basics of machine learning.**5 Steps for Advanced Data Analysis using Visualization**- Oct 28, 2016.

In most of the scientific researches, due to large amount of experiment data, statistical analysis is typically done by technical experts in computing and statistics. Unfortunately, these experts are not the experts of underlying research; which may cause gaps in analysis. If actual researchers are given easy to use tools and methods to handle and analyse data, it will enrich the research outcome for sure.**Clustering Key Terms, Explained**- Oct 18, 2016.

Getting started with Data Science or need a refresher? Clustering is among the most used tools of Data Scientists. Check out these 10 Clustering-related terms and their concise definitions.**Comparing Clustering Techniques: A Concise Technical Overview**- Sep 26, 2016.

A wide array of clustering techniques are in use today. Given the widespread use of clustering in everyday data mining, this post provides a concise technical overview of 2 such exemplar techniques.**The Great Algorithm Tutorial Roundup**- Sep 20, 2016.

This is a collection of tutorials relating to the results of the recent KDnuggets algorithms poll. If you are interested in learning or brushing up on the most used algorithms, as per our readers, look here for suggestions on doing so!**Top Algorithms and Methods Used by Data Scientists**- Sep 12, 2016.

Latest KDnuggets poll identifies the list of top algorithms actually used by Data Scientists, finds surprises including the most academic and most industry-oriented algorithms.**Doing the Data Science That Drives Predictive Personalization**- Sep 9, 2016.

Agile collaboration within data science teams is essential to the vision of customer analytics and personalization. Attend IBM DataFirst Launch Event on Sep 27 in New York City to engage with open-source community leaders and practitioners.**MDL Clustering: Unsupervised Attribute Ranking, Discretization, and Clustering**- Aug 26, 2016.

MDL Clustering is a free software suite for unsupervised attribute ranking, discretization, and clustering based on the Minimum Description Length principle and built on the Weka Data Mining platform.**New Poll: Which methods/algorithms you used for a Data Science or Machine Learning application?**- Aug 26, 2016.

Which methods/approaches you used in the past 12 months for an actual Data Science-related application? Please vote and we will analyze and publish the results.**A Tutorial on the Expectation Maximization (EM) Algorithm**- Aug 25, 2016.

This is a short tutorial on the Expectation Maximization algorithm and how it can be used on estimating parameters for multi-variate data.**Machine Learning Key Terms, Explained**- May 25, 2016.

An overview of 12 important machine learning concepts, presented in a no frills, straightforward definition style.**Top Talks and Tutorials From PyData London**- May 11, 2016.

Get some insight into the most recent Python data science talks and presentations with this eclectic mix of videos from PyData London 2016.**A comparison between PCA and hierarchical clustering**- Feb 23, 2016.

Graphical representations of high-dimensional data sets are the backbone of exploratory data analysis. We examine 2 of the most commonly used methods: heatmaps combined with hierarchical clustering and principal component analysis (PCA).**What questions can data science answer?**- Jan 1, 2016.

There are only five questions machine learning can answer: Is this A or B? Is this weird? How much/how many? How is it organized? What should I do next? We examine these questions in detail and what it implies for data science.**6 crazy things Deep Learning and Topological Data Analysis can do with your data**- Nov 2, 2015.

Want to analyze a high dimensional dataset and you are running out of options? Find out how Deep Learning combined with Topological Data Analysis can do exactly that and more.**Data Mining/Data Science “Nobel Prize”: ACM SIGKDD 2015 Innovation Award to Hans-Peter Kriegel**- Jul 22, 2015.

Prof. Hans-Peter Kriegel wins ACM KDD Innovation Award for his influential research and scientific contributions to data mining in clustering, outlier detection and high-dimensional data analysis, including density-based approaches.**KDnuggets™ News 15:n04, Feb 4: Top Big Data Influencers; A Common Mistake with Time Series; Ayasdi**- Feb 4, 2015.

Top Big Data Influencers and Brands; K-means clustering is not a free lunch; Avoiding a Common Mistake with Time Series; Ayasdi: Managing Data Complexity through Topology; Big Data Could Revolutionize Healthcare.**BigML machine learning platform Winter 2015 Release, Feb 11**- Feb 2, 2015.

See the latest in BigML's continuously evolved machine learning platform with its emphasis on consumability, programmability, and scalability. Feb 11 webinar at 9 am PT and 5 pm PT.**Data Science 102: K-means clustering is not a free lunch**- Jan 29, 2015.

K-means is a widely used method in cluster analysis, but what are its underlying assumptions and drawbacks? We examine what happens for non-spherical data and unevenly sized clusters.**Top /r/MachineLearning posts, Jan 18-24: K-means clustering is not a free lunch; A Deep Dive into Recurrent Neural Nets**- Jan 26, 2015.

Textbook Easter Eggs, issues with k-means, recurrent neural networks, genetic algorithm challenges, and the implementation of machine learning pipelines are all in this week's top /r/MachineLearning posts.**Supermarket customers segmentation using Self-Organizing Mapping**- Oct 23, 2014.

See how a leading European supermarket chain improved customer value and profitability and identified key customer groups by applying business intelligence and analytics techniques like self-organizing maps.**KDnuggets Social Network in NodeXL, May 2014**- May 29, 2014.

We examine KDnuggets Twitter Social Network, as generated by NodeXL, looking at clusters, top Twitter accounts, URLs, hashtags, words, and what does it all mean?**More Data Mining with Weka**- Jan 30, 2014.

This online course teaches both principles and practical data mining techniques, lets students work on very big datasets, classify text, experiment with clustering, and much more.