KDnuggets » Forums
Latest News



 FAQFAQ    SearchSearch    MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Clustering and scoring a new data

 
Post new topic   Reply to topic    www.kdnuggets.com Forum Index -> Classification & Clustering
View previous topic :: View next topic  
Author Message
Minnie



Joined: 13 Mar 2006
Posts: 3

PostPosted: Tue Mar 14, 2006 1:33 pm    Post subject: Clustering and scoring a new data Reply with quote

I have done a lot of clustering, but I haven't had to score a new data file using the saved centroids. One thing I wonder is if we have all the variables for the clustering, why not running the cluster analysis with the new data instead of using the saved centroids. In the predictive model, we build a model based on the outcome we already know and estimate the probability for the unknown outcome. In the clustering, even in the initial classification, we still estimate the group membership, don't we?
I would appreciate any feedbacks.
Back to top
View user's profile Send private message
editor
Site Admin


Joined: 04 Oct 2005
Posts: 120
Location: Boston, MA

PostPosted: Tue Mar 14, 2006 3:16 pm    Post subject: Clustering Reply with quote

Clustering is a very "imprecise" science, because it is hard to say when
clusters are correct.

If you already have some meaningful clusters, then you may want to score the new data against the existing clusters. If the new data is comparable in size to old data, or if previous clusters have no meaning, then you can generate new clusters on the new data.

Gregory Piatetsky
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Minnie



Joined: 13 Mar 2006
Posts: 3

PostPosted: Tue Mar 14, 2006 5:14 pm    Post subject: Reply with quote

Thanks for your reply! Has anyone done this: generated clusters with a set of variables and then score a new data based on some of the original variables? We are collecting survey data and will generate clusters based on the survey items and demographic variable. Then I am going to score the entire customer database. I won't have the survey items in the customer database, but I would like to identify the entire customer clusters based on the partial data. Does this sound totally non-sense, or is this something people do? Thanks!
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    www.kdnuggets.com Forum Index -> Classification & Clustering All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group

KDnuggets » Forums

Copyright © 2012 KDnuggets.   Subscribe to KDnuggets News! Tweet Twitter | facebook Facebook | RSS RSS | About KDnuggets