Tom Wolfer, Sep 28, 2010
Euclidean Distance clustering may be applied to Facebook, MySpace and other social network data. This analysis may reveal 'Geo-Groups' of users according to where their network of friends reside. Users with a similar geographical network of friends may be targeted via customized online and offline marketing campaigns.
For clustering to succeed, raw user account data must be transformed from row to column format and summarized in one of two ways:
- in the form of a raw count of friends that a given user has in each city
- with a boolean variable to indicate whether or not a user has a friend in a particular city.
Geo-Group 1 Friends Network (200): Chicago, New York, Los Angeles
Geo-Group 2 Friends Network (700): Chicago, Los Angeles
Geo-Group 3 Friends Network (100): Toronto, Chicago
These three clusters are then segmented according to where each of the 1,000 MySpace accountholders lives, revealing the following Geo-Group sub-segments:
Geo-Group 1 User Place of Residence (200):
a. Chicago (100)
b. Los Angeles (50)
c. Boston (50)
Geo-Group 2 User Place of Residence (700):
a. Toronto (600)
b. Los Angeles (100)
Geo-Group 3 User Place of Residence (100):
a. Seattle (25)
b. Los Angeles (75)
Each of these sub-segments might be targeted with different online or offline marketing campaigns. For example, a Canadian airline might offer Geo-Group 3b select discounts on flights from Los Angeles to Toronto or Chicago.
The above clustering example has not, however, considered the power of a MySpace user's relationship with each one of his or her friends. The following data may be used to apply a weighting system to reflect the true strength of a MySpace user's relationship with each one of his or her friends:
- who requests and who accepts a friend invite
- comments exchanged, photos shared and other interactions
Calculating these weights requires that a variable for each type of social network activity be created for each of the 1,000 MySpace users in our above example. Let's assume that one MySpace user has the following values for two activity variables:
%_Comment_Activity (.20) - overall percent of time making comments with friends
%_Photo_Activity (.80) - overall percent of time sharing photos with friends
A photo exchange between this MySpace user and another reflects a stronger relationship with a friend than if he or she only sends a comment. The values of .20 and .80 may be applied as weights against raw count or boolean variable values that summarize activities between friends in each city. In doing so, one more layer of complexity is added to the results of our initial clustering exercise. MySpace users can now be grouped into sub-segments that reflect stronger or weaker 'Geo-Group' networks of relationships.
Where an individual's friends live can be a powerful source of information for marketing purposes. Applying Euclidean Distance clustering to social network data is another example of how data mining may be used to solve an important business problem.
I am away on vacation till October 11, with uncertain net access, but will reply to your email when I get back.
Facebook has begun clustering based on a member's circle of friends:
I thought that this article might be a good follow-up to my post.
Hi Khader, thanks for distributing my blog post via Twitter. Do you have any questions about my methodology? Have you ever worked with Social Network data?
Hello Tomas, and, thank you for your question.
One must be granted permission by a social network site in order to conduct this form of analysis. Or, he or she may set up their own social network site about a topic. For example, I write on another social networking site called AnalyticBridge (www.analyticbridge.com) that would capture the same type of geo-graphic friends information.
Hi Tom, thanks for an interesting article.
It's indeed very interesting to analyze data from social networks and to find hidden connections!
How do you get access to the data?