Rapid-I Blog, Nov 27, 2012, by Giuseppe Taibi.
hack/reduce brands itself as Boston's Big Data hacking space. Backed by a who's who of Boston tech powerhouses, ranging from Harvard and MIT to Google and Microsoft, to the State of Massachusetts and top-tier VCs, hack/reduce is located in the historic Kendall Boiler and Tank building that gives the name to the vibrant Kendall Square technology district, brimming with startup excitement.
True to its mission of "helping Boston create the talent and the technologies that will shape our future in a big data-driven economy," hack/reduce organized its first
hackathon on Nov. 17. We at
Rapid-I love Big Data so this was a terrific opportunity to mingle with the Boston Big Data community. Rapid-I's popular open source visual environment for data analysis
RapidMiner can easily work on Big Data via
Our team worked on a 25GB dating profiles database provided by Mate1.com. Other available databases included carbon dioxide measurements, Amazon.com product database, stock market prices, wikipedia and more (full list of Datasets available on the hackathon wiki). We were interested in performing cluster analysis to explore the similarities among user profiles. The Mate1 user profile attributes included age, gender, eye color, smoking habits, dating preferences, astrological signs, physical fitness, political views and many others.
For this task, we applied a K-Means clustering operator to the dataset, then used RapidMiner to create a scatter matrix plot to explore how the profile attributes were related to each other. We found out that most of the members only filled out the minimum number of fields on the profile. Also, for whatever reason, people with the same eye color also identify with the same body type. In almost every comparison we noticed that many people chose not to specify a value for an attribute. People definitely tend to enter the minimum information necessary to create a profile and start browsing other people profiles. One of the frustrations was the fact that the data set was normalized so we did not really know what was the exact meaning for a certain attribute value. Towards the end we started to reverse engineer this by creating our own profile on the Mate1.com website but then we ran out of time.
We also conducted an analysis to verify the "Half Your Age Plus 7 Rule" referring to the age difference among partners that is considered socially acceptable. More specifically, we mined the dating database to answer the question "What is the Oldest / Youngest Person that you are wiling to date?". In an very entertaining presentation, one team member exposed the harsh fact that for Gender "2," the rule holds generally true, while for Gender "3," there is a big difference in the form of members in their 20s and 30s willing to date partners much older than the 7+ rule. The database provided did not specify a text label for the gender, only a number, so feel free to guess which is which.
Read more.
Previous post | Next post |