Enigma startup for Public Big Data

This startup focuses on Big Data in the public domain, from governments, NGOs, and the media and is building infrastructure to connect it together and make it searchable and accessible through a web platform and API.

VentureBeat, May 1, 2013, by Rebecca Grant

EnigmaEnigma launched out of beta today to shed light on this hidden world. This "big data" startup focuses on data in the public domain, such as those published by governments, NGOs, and the media. According to founder Marc DaCosta, this data is "totally in the dark" and "scattered across a dizzying array of data silos." Enigma is building infrastructure to connect all of it together and make it searchable and accessible through a web platform and API.

"Currently, the world of public data is much like the world that existed on the Internet before search engines became available in the 1990s," said DaCosta. "Because there is no infrastructure to search and discover public data, huge sources of real important insight and knowledge about how companies, people, and places interact in the world is hidden from view. By surfacing this data in a usable and intuitive way, Enigma empowers a factual, data-driven view of the world that currently is not possible."

The company describes itself as "Google for public data." Using a combination of automated web crawlers and directly reaching out to government agencies, Engima's database contains billions of public records across more than 100,000 datasets. Pulling them all together breaks down the barriers that exist between various local, state, federal, and institutional search portals. On top of this information is an "entity graph" which searches through the data to discover relevant results. Furthermore, once the information is broken out of the silos, users can filter, reshape, and connect various datasets to find correlations.

DaCosta said that while there are plenty of notable players in public data, such as Factual, Socrata, Bloomberg, Thomson Reuters, and LexisNexis, Enigma is distinguished by its "holistic approach" to data acquisition and interface which supports organic discovery of data.

Read more.