# Tag: Clustering (59)

**[ebook] Manipulating Data in Apache Spark**- Oct 29, 2018.

In this ebook from Databricks, learn how DataFrames leverage the power of distributed processing through Spark, how to make big data processing easier for a wider audience, and more.**Iterative Initial Centroid Search via Sampling for k-Means Clustering**- Sep 12, 2018.

Thinking about ways to find a better set of initial centroid positions is a valid approach to optimizing the k-means clustering process. This post outlines just such an approach.**An Introduction to t-SNE with Python Example**- Aug 15, 2018.

In this post we’ll give an introduction to the exploratory and visualization t-SNE algorithm. t-SNE is a powerful dimension reduction and visualization technique used on high dimensional data.**Unsupervised Learning Demystified**- Aug 13, 2018.

Unsupervised learning is a pattern-finding technique for mining inspiration from your data. Let's demystify!**K-Means in Real Life: Clustering Workout Sessions**- Aug 3, 2018.

By using the within-cluster sum of squares as cost function, data points in the same cluster will be similar to each other, whereas data points in different clusters will have a lower level of similarity.**Clustering Using K-means Algorithm**- Jul 18, 2018.

This article explains K-means algorithm in an easy way. I’d like to start with an example to understand the objective of this powerful technique in machine learning before getting into the algorithm, which is quite simple.**KDnuggets™ News 18:n25, Jun 27: 5 Clustering Algorithms Data Scientists Need to Know; Detecting Sarcasm with Deep Convolutional Neural Networks?**- Jun 27, 2018.

Also 30 Free Resources for Machine Learning, Deep Learning, NLP ; 7 Simple Data Visualizations You Should Know in R.**The 5 Clustering Algorithms Data Scientists Need to Know**- Jun 20, 2018.

Today, we’re going to look at 5 popular clustering algorithms that data scientists need to know and their pros and cons!**Audience Segmentation**- Jun 6, 2018.

The process of audience segmentation is not about just statistics, it’s about finding your ideal clients and choosing the right way of interaction with them.**Kernel Machine Learning (KernelML) - Generalized Machine Learning Algorithm**- May 18, 2018.

This article introduces a pip Python package called KernelML, created to give analysts and data scientists a generalized machine learning algorithm for complex loss functions and non-linear coefficients.**Ten Machine Learning Algorithms You Should Know to Become a Data Scientist**- Apr 11, 2018.

It's important for data scientists to have a broad range of knowledge, keeping themselves updated with the latest trends. With that being said, we take a look at the top 10 machine learning algorithms every data scientist should know.**Hierarchical Classification – a useful approach for predicting thousands of possible categories**- Mar 12, 2018.

A detailed look at the flat and hierarchical classification approach to dealing with multi-class classification problems.**Topological Data Analysis for Data Professionals: Beyond Ayasdi**- Jan 16, 2018.

We review recent developments and tools in topological data analysis, including applications of persistent homology to psychometrics and a recent extension of piecewise regression, called Morse-Smale regression.**Top Data Science and Machine Learning Methods Used in 2017**- Dec 11, 2017.

The most used methods are Regression, Clustering, Visualization, Decision Trees/Rules, and Random Forests; Deep Learning is used by only 20% of respondents; we also analyze which methods are most "industrial" and most "academic".**3 different types of machine learning**- Nov 1, 2017.

In this extract from “Python Machine Learning” a top data scientist Sebastian Raschka explains 3 main types of machine learning: Supervised, Unsupervised and Reinforcement Learning. Use code PML250KDN to save 50% off the book cost.**Density Based Spatial Clustering of Applications with Noise (DBSCAN)**- Oct 26, 2017.

DBSCAN clustering can identify outliers, observations which won’t belong to any cluster. Since DBSCAN clustering identifies the number of clusters as well, it is very useful with unsupervised learning of the data when we don’t know how many clusters could be there in the data.**Top 10 Machine Learning with R Videos**- Oct 24, 2017.

A complete video guide to Machine Learning in R! This great compilation of tutorials and lectures is an amazing recipe to start developing your own Machine Learning projects.**Tackling Unstructured Data With Text Exploration – On-demand webcast**- Sep 7, 2017.

Discover how to use a platform to organize unstructured data to see the linkages between word usage and document of origin, see the themes in a word cloud, and use topic extraction and document clustering.**Comparing Distance Measurements with Python and SciPy**- Aug 15, 2017.

This post introduces five perfectly valid ways of measuring distances between data points. We will also perform simple demonstration and comparison with Python and the SciPy library.**KDnuggets™ News 17:n26, Jul 12: Applying Deep Learning to Real-world Problems; New Poll: Will society be better from increased automation, AI?**- Jul 12, 2017.

Also Text Clustering: Get quick insights from Unstructured Data; Using the TensorFlow API: An Introductory Tutorial Series; Deep Learning Zero to One: 5 Awe-Inspiring Demos with Code for Beginners, part 2**Text Clustering : Quick insights from Unstructured Data, part 2**- Jul 4, 2017.

We will build this in a modular way and also focus on exposing the functionalities as an API so that it can serve as a plug and play model without any disruptions to the existing systems.**Text Clustering: Get quick insights from Unstructured Data**- Jun 28, 2017.

Grouping and clustering free text is an important advance towards making good use of it. We present an algorithm for unsupervised text clustering approach that enables business to programmatically bin this data.**KDnuggets™ News 17:n24, Jun 21: Learn Data Science skills you need for free; Understanding Deep Learning Requires Re-thinking Generalization**- Jun 21, 2017.

Learn Data Science skills you need for free; Understanding Deep Learning Requires Re-thinking Generalization; K-means Clustering with Tableau - Call Detail Records; The Machine Learning Algorithms Used in Self-Driving Cars.**K-means Clustering with Tableau – Call Detail Records Example**- Jun 16, 2017.

We show how to use Tableau 10 clustering feature to create statistically-based segments that provide insights about similarities in different groups and performance of the groups when compared to each other.**Machine Learning Workflows in Python from Scratch Part 2: k-means Clustering**- Jun 7, 2017.

The second post in this series of tutorials for implementing machine learning workflows in Python from scratch covers implementing the k-means clustering algorithm.**K-means Clustering with R: Call Detail Record Analysis**- Jun 6, 2017.

Call Detail Record (CDR) is the information captured by the telecom companies during Call, SMS, and Internet activity of a customer. This information provides greater insights about the customer’s needs when used with customer demographics.**Must-Know: How to determine the most useful number of clusters?**- May 9, 2017.

Without knowing the ground truth of a dataset, then, how do we know what the optimal number of data clusters are? We will have a look at 2 particular popular methods for attempting to answer this question: the elbow method and the silhouette method.**Toward Increased k-means Clustering Efficiency with the Naive Sharding Centroid Initialization Method**- Mar 13, 2017.

What if a simple, deterministic approach which did not rely on randomization could be used for centroid initialization? Naive sharding is such a method, and its time-saving and efficient results, though preliminary, are promising.**Beginner’s Guide to Customer Segmentation**- Mar 9, 2017.

At the core of customer segmentation is being able to identify different types of customers and then figure out ways to find more of those individuals so you can... you guessed it, get more customers!**K-Means & Other Clustering Algorithms: A Quick Intro with Python**- Mar 8, 2017.

In this intro cluster analysis tutorial, we'll check out a few algorithms in Python so you can get a basic understanding of the fundamentals of clustering on a real dataset.**7 More Steps to Mastering Machine Learning With Python**- Mar 1, 2017.

This post is a follow-up to last year's introductory Python machine learning post, which includes a series of tutorials for extending your knowledge beyond the original.

**KDnuggets™ News 17:n06, Feb 15: So What is Big Data? 52 Useful Machine Learning APIs; Data Science finds Perfect Valentines Dates**- Feb 15, 2017.

Also Making Python Speak SQL with pandasql; 52 Useful Machine Learning & Prediction APIs, updated; New Poll: Do you support Trump Immigration Ban?**Automatically Segmenting Data With Clustering**- Feb 9, 2017.

In this post, we’ll walk through one such algorithm called K-Means Clustering, how to measure its efficacy, and how to choose the sets of segments you generate.**Top KDnuggets tweets, Feb 01-07: Learning to Learn by Gradient Descent by Gradient Descent**- Feb 8, 2017.

Also #DeepLearning Research Review: Natural Language Processing; K-Means, Other Clustering Algorithms: A Quick Intro with #Python; Why #DeepLearning Needs Assembler Hackers.**Quickly tackle unstructured text data**- Feb 8, 2017.

Learn about the new advanced text exploration capabilities available that let you quickly extract insights from text-based data.**Introduction to K-means Clustering: A Tutorial**- Dec 9, 2016.

A beginner introduction to the widely-used K-means clustering algorithm, using a delivery fleet data example in Python.**Introduction to Machine Learning for Developers**- Nov 28, 2016.

Whether you are integrating a recommendation system into your app or building a chat bot, this guide will help you get started in understanding the basics of machine learning.**5 Steps for Advanced Data Analysis using Visualization**- Oct 28, 2016.

In most of the scientific researches, due to large amount of experiment data, statistical analysis is typically done by technical experts in computing and statistics. Unfortunately, these experts are not the experts of underlying research; which may cause gaps in analysis. If actual researchers are given easy to use tools and methods to handle and analyse data, it will enrich the research outcome for sure.**Clustering Key Terms, Explained**- Oct 18, 2016.

Getting started with Data Science or need a refresher? Clustering is among the most used tools of Data Scientists. Check out these 10 Clustering-related terms and their concise definitions.**Comparing Clustering Techniques: A Concise Technical Overview**- Sep 26, 2016.

A wide array of clustering techniques are in use today. Given the widespread use of clustering in everyday data mining, this post provides a concise technical overview of 2 such exemplar techniques.**The Great Algorithm Tutorial Roundup**- Sep 20, 2016.

This is a collection of tutorials relating to the results of the recent KDnuggets algorithms poll. If you are interested in learning or brushing up on the most used algorithms, as per our readers, look here for suggestions on doing so!**Top Algorithms and Methods Used by Data Scientists**- Sep 12, 2016.

Latest KDnuggets poll identifies the list of top algorithms actually used by Data Scientists, finds surprises including the most academic and most industry-oriented algorithms.**Doing the Data Science That Drives Predictive Personalization**- Sep 9, 2016.

Agile collaboration within data science teams is essential to the vision of customer analytics and personalization. Attend IBM DataFirst Launch Event on Sep 27 in New York City to engage with open-source community leaders and practitioners.**MDL Clustering: Unsupervised Attribute Ranking, Discretization, and Clustering**- Aug 26, 2016.

MDL Clustering is a free software suite for unsupervised attribute ranking, discretization, and clustering based on the Minimum Description Length principle and built on the Weka Data Mining platform.**New Poll: Which methods/algorithms you used for a Data Science or Machine Learning application?**- Aug 26, 2016.

Which methods/approaches you used in the past 12 months for an actual Data Science-related application? Please vote and we will analyze and publish the results.**A Tutorial on the Expectation Maximization (EM) Algorithm**- Aug 25, 2016.

This is a short tutorial on the Expectation Maximization algorithm and how it can be used on estimating parameters for multi-variate data.**Machine Learning Key Terms, Explained**- May 25, 2016.

An overview of 12 important machine learning concepts, presented in a no frills, straightforward definition style.**Top Talks and Tutorials From PyData London**- May 11, 2016.

Get some insight into the most recent Python data science talks and presentations with this eclectic mix of videos from PyData London 2016.**A comparison between PCA and hierarchical clustering**- Feb 23, 2016.

Graphical representations of high-dimensional data sets are the backbone of exploratory data analysis. We examine 2 of the most commonly used methods: heatmaps combined with hierarchical clustering and principal component analysis (PCA).**What questions can data science answer?**- Jan 1, 2016.

There are only five questions machine learning can answer: Is this A or B? Is this weird? How much/how many? How is it organized? What should I do next? We examine these questions in detail and what it implies for data science.**6 crazy things Deep Learning and Topological Data Analysis can do with your data**- Nov 2, 2015.

Want to analyze a high dimensional dataset and you are running out of options? Find out how Deep Learning combined with Topological Data Analysis can do exactly that and more.**Data Mining/Data Science “Nobel Prize”: ACM SIGKDD 2015 Innovation Award to Hans-Peter Kriegel**- Jul 22, 2015.

Prof. Hans-Peter Kriegel wins ACM KDD Innovation Award for his influential research and scientific contributions to data mining in clustering, outlier detection and high-dimensional data analysis, including density-based approaches.**KDnuggets™ News 15:n04, Feb 4: Top Big Data Influencers; A Common Mistake with Time Series; Ayasdi**- Feb 4, 2015.

Top Big Data Influencers and Brands; K-means clustering is not a free lunch; Avoiding a Common Mistake with Time Series; Ayasdi: Managing Data Complexity through Topology; Big Data Could Revolutionize Healthcare.**BigML machine learning platform Winter 2015 Release, Feb 11**- Feb 2, 2015.

See the latest in BigML's continuously evolved machine learning platform with its emphasis on consumability, programmability, and scalability. Feb 11 webinar at 9 am PT and 5 pm PT.**Data Science 102: K-means clustering is not a free lunch**- Jan 29, 2015.

K-means is a widely used method in cluster analysis, but what are its underlying assumptions and drawbacks? We examine what happens for non-spherical data and unevenly sized clusters.**Top /r/MachineLearning posts, Jan 18-24: K-means clustering is not a free lunch; A Deep Dive into Recurrent Neural Nets**- Jan 26, 2015.

Textbook Easter Eggs, issues with k-means, recurrent neural networks, genetic algorithm challenges, and the implementation of machine learning pipelines are all in this week's top /r/MachineLearning posts.**Supermarket customers segmentation using Self-Organizing Mapping**- Oct 23, 2014.

See how a leading European supermarket chain improved customer value and profitability and identified key customer groups by applying business intelligence and analytics techniques like self-organizing maps.**KDnuggets Social Network in NodeXL, May 2014**- May 29, 2014.

We examine KDnuggets Twitter Social Network, as generated by NodeXL, looking at clusters, top Twitter accounts, URLs, hashtags, words, and what does it all mean?**More Data Mining with Weka**- Jan 30, 2014.

This online course teaches both principles and practical data mining techniques, lets students work on very big datasets, classify text, experiment with clustering, and much more.