Many industry breakthroughs, including the internet, were based on basic research, such as sponsored by NSF and DARPA.
In that light, it is interesting to see $15 million in new big data research projects that aim to develop new tools and methods to extract and use knowledge from collections of large data sets to accelerate progress in science and engineering research and innovation.
The awards included:
BIGDATA: Mid-Scale: DCM: Collaborative Research: Eliminating the Data Ingestion Bottleneck in Big-Data Applications
Rutgers University, Martin Farach-Colton
Stony Brook University, Michael Bender
Big-data practice suggests that there is a tradeoff between the speed of data ingestion, the ability to answer queries quickly (e.g., via indexing), and the freshness of data. This tradeoff has manifestations in the design of all types of storage systems. In this project the principal investigators show that this is not a fundamental tradeoff, but rather a tradeoff imposed by the choice of data structure. They depart from the use of traditional indexing methodologies to build storage systems that maintain indexing 200 times faster in databases with billions of entries.
BIGDATA: Mid-Scale: ESCE: DCM: Collaborative Research: DataBridge - A Sociometric System for Long-Tail Science Data Collections
University of North Carolina at Chapel Hill, Arcot Rajasekar
Harvard University, Gary King
North Carolina Agriculture & Technical State University, Justin Zhan
The sheer volume and diversity of data present a new set of challenges in locating all of the data relevant to a particular line of scientific research. Taking full advantage of the unique data in the "long-tail of science" requires new tools specifically created to assist scientists in their search for relevant data sets. DataBridge supports advances in science and engineering by directly enabling and improving discovery of relevant scientific data across large, distributed and diverse collections using socio-metric networks. The system will also provide an easy means of publishing data through the DataBridge, and incentivize data producers to do so by enhancing collaboration and data-oriented networking.
BIGDATA: Mid-Scale: DA: Distribution-based Machine Learning for High-dimensional Datasets
Carnegie Mellon University, Aarti Singh
The project aims to develop new statistical and algorithmic approaches to natural generalizations of a class of standard machine learning problems. The resulting novel machine learning approaches are expected to benefit other scientific fields in which data points can be naturally modeled by sets of distributions, such as physics, psychology, economics, epidemiology, medicine and social network-analysis.
BIGDATA: Mid-Scale: DA: Collaborative Research: Big Tensor Mining: Theory, Scalable Algorithms and Applications
Carnegie Mellon University, Christos Faloutsos
University of Minnesota, Twin Cities, Nikolaos Sidiropoulos
The objective of this project is to develop theory and algorithms to tackle the complexity of language processing, and to develop methods that approximate how the human brain works in processing language. The research also promises better algorithms for search engines, new approaches to understanding brain activity, and better recommendation systems for retailers.
BIGDATA: Mid-Scale: ESCE: Collaborative Research: Discovery and Social Analytics for Large-Scale Scientific Literature
Rutgers University, Paul Kantor
Cornell University, Thorsten Joachims
Princeton University, David Biei
This project will focus on the problem of bringing massive amounts of data down to the human scale by investigating the individual and social patterns that relate to how text repositories are actually accessed and used. It will improve the accuracy and relevance of complex scientific literature searches.