KDnuggets Home » News » 2010 » Sep » News Briefs » Euclidean Clustering of Social Network Data  ( < Prev | 10:n23 | Next > )

Applying Euclidean Distance Clustering to Social Network Data


 
  
Euclidean Distance clustering may be applied to Facebook, MySpace and other social network data. This analysis may reveal 'Geo-Groups' of users according to where their network of friends reside.


Tom Wolfer Tom Wolfer, Sep 28, 2010

Euclidean Distance clustering may be applied to Facebook, MySpace and other social network data. This analysis may reveal 'Geo-Groups' of users according to where their network of friends reside. Users with a similar geographical network of friends may be targeted via customized online and offline marketing campaigns.

For clustering to succeed, raw user account data must be transformed from row to column format and summarized in one of two ways:

  • in the form of a raw count of friends that a given user has in each city
OR

  • with a boolean variable to indicate whether or not a user has a friend in a particular city.
K-Means Clustering Let's assume that we have summarized data for 1,000 MySpace users, and that it has been rolled-up to indicate only whether each user has a friend in a particular city. After conducting a clustering exercise, the following three Geo-Groups are consistently found in the Training, Test and Validation sets:


Geo-Group 1 Friends Network (200): Chicago, New York, Los Angeles
Geo-Group 2 Friends Network (700): Chicago, Los Angeles
Geo-Group 3 Friends Network (100): Toronto, Chicago

These three clusters are then segmented according to where each of the 1,000 MySpace accountholders lives, revealing the following Geo-Group sub-segments:


Geo-Group 1 User Place of Residence (200):
   a. Chicago (100)
   b. Los Angeles (50)
   c. Boston (50)

Geo-Group 2 User Place of Residence (700):
   a. Toronto (600)
   b. Los Angeles (100)

Geo-Group 3 User Place of Residence (100):
   a. Seattle (25)
   b. Los Angeles (75)

Each of these sub-segments might be targeted with different online or offline marketing campaigns. For example, a Canadian airline might offer Geo-Group 3b select discounts on flights from Los Angeles to Toronto or Chicago.

The above clustering example has not, however, considered the power of a MySpace user's relationship with each one of his or her friends. The following data may be used to apply a weighting system to reflect the true strength of a MySpace user's relationship with each one of his or her friends:

  • who requests and who accepts a friend invite
  • comments exchanged, photos shared and other interactions
A user's relationship is to be considered stronger if he or she creates, rather than accepts, a friend invite to another. Stronger relationships may also be measured by the degree of time spent sending messages, sharing photos, posting comments, or engaging in other activities between users.

Calculating these weights requires that a variable for each type of social network activity be created for each of the 1,000 MySpace users in our above example. Let's assume that one MySpace user has the following values for two activity variables:

%_Comment_Activity (.20) - overall percent of time making comments with friends
%_Photo_Activity (.80) - overall percent of time sharing photos with friends

A photo exchange between this MySpace user and another reflects a stronger relationship with a friend than if he or she only sends a comment. The values of .20 and .80 may be applied as weights against raw count or boolean variable values that summarize activities between friends in each city. In doing so, one more layer of complexity is added to the results of our initial clustering exercise. MySpace users can now be grouped into sub-segments that reflect stronger or weaker 'Geo-Group' networks of relationships.

Where an individual's friends live can be a powerful source of information for marketing purposes. Applying Euclidean Distance clustering to social network data is another example of how data mining may be used to solve an important business problem.

Comments:

Guest
I am away on vacation till October 11, with uncertain net access, but will reply to your email when I get back.
Gregory

GregoryPS

Tom Wolfer
Facebook has begun clustering based on a member's circle of friends:
www.theglobeandmail.com/news/technology/facebook-gives-users-more-control-of-data/article1745697/

I thought that this article might be a good follow-up to my post.

Thanks,
Tom

Tom Wolfer
Hi Khader, thanks for distributing my blog post via Twitter. Do you have any questions about my methodology? Have you ever worked with Social Network data?

Tom Wolfer
Hello Tomas, and, thank you for your question.

One must be granted permission by a social network site in order to conduct this form of analysis. Or, he or she may set up their own social network site about a topic. For example, I write on another social networking site called AnalyticBridge (www.analyticbridge.com) that would capture the same type of geo-graphic friends information.

Tomas Keller
Hi Tom, thanks for an interesting article.

It's indeed very interesting to analyze data from social networks and to find hidden connections!

How do you get access to the data?


Rgrds
Tomas Keller


KDnuggets Home » News » 2010 » Sep » News Briefs » Euclidean Clustering of Social Network Data  ( < Prev | 10:n23 | Next > )