U. of Utah: Data mining made faster

Venkatasubramanian and colleagues devised a new method of multidimensional scaling that is faster, simpler, can be used universally for numerous problems and can handle more data, basically by "squashing things [data] down to size."

Date:

New method eases analysis of 'multidimensional' information

SALT LAKE CITY, July 22, 2010 - To many big companies, you aren't just a customer, but are described by multiple "dimensions" of information within a computer database. Now, a University of Utah computer scientist has devised a new method for simpler, faster "data mining," or extracting and analyzing massive amounts of such data.

"Whether you like it or not, Google, Facebook, Walmart and the government are building profiles of you, and these consist of hundreds of attributes describing you" - your online searches, purchases, shared videos and recommendations to your Facebook friends, says Suresh Venkatasubramanian , an assistant professor of computer science.

"If you line them up for each person, you have a line of hundreds of numbers that paint a picture of a person: who they are, what their interests are, who their friends are and so forth," he says. "These strings of hundreds of attributes are called high-dimensional data because each attribute is called one dimension. Data mining is about digging up interesting information from this high-dimensional data."

A group of data-mining methods named "multidimensional scaling" or MDS first was used in the 1930s by psychologists and has been used ever since to make data analysis simpler by reducing the "dimensionality" of the data. Venkatasubramanian says it is "probably one of the most important tools in data mining and is used by countless researchers everywhere."

Now, Venkatasubramanian and colleagues have devised a new method of multidimensional scaling that is faster, simpler, can be used universally for numerous problems and can handle more data, basically by "squashing things [data] down to size."

He is scheduled to present the new method

Universal Multi-Dimensional Scaling, by Arvind Agarwal, Jeff Phillips and Suresh Venkatasubramanian

on Wednesday, July 28 in Washington at the premier meeting in his field, the KDD-2010 Conference on Knowledge Discovery and Data Mining sponsored by the Association for Computing Machinery.

"This problem of dimensionality reduction and data visualization is fundamental in many disciplines in natural and social sciences," says Venkatasubramanian. "So we believe our method will be useful in doing better data analysis in all of these areas."

"What our approach does is unify into one common framework a number of different methods for doing this dimensionality reduction" to simplify high-dimensional data, he says. "We have a computer program that unifies many different methods people have developed over the past 60 or 70 years. One thing that makes it really good for today's data - in addition to being a one-stop shopping procedure - is it also handles much larger data sets than prior methods were able to handle."

He adds: "Prior methods on modern computers struggle with data from more than 5,000 people. Our method smoothly handles well above 50,000 people."

Venkatasubramanian conducted the research with University of Utah computer science doctoral student Arvind Agarwal and postdoctoral fellow Jeff Phillips. It was funded by the National Science Foundation.

For more information on the University of Utah School of Computing and College of Engineering, see: www.cs.utah.edu and http://www.coe.utah.edu

Read more.