DARPA SBIR: Defense Against National Vulnerabilities in Public Data
Could a modestly funded group deliver nation-state type effects using only public data? This DARPA SBIR calls to investigate the US national security threat posed by public data and develop tools to characterize and assess the nature, persistence, and quality of the data. Opens: Aug 26, Closes Sep 25, 2013.
Opens: August 26, 2013 – Closes: September 25, 2013
TECHNOLOGY AREAS: Information Systems, Human Systems
Investigate the national security threat posed by public data available either for purchase or through open sources. Based on principles of data science, develop tools to characterize and assess the nature, persistence, and quality of the data. Develop tools for the rapid anonymization and de-anonymization of data sources. Develop framework and tools to measure the national security impact of public data and to defend against the malicious use of public data against national interests.
The vulnerabilities to individuals from a data compromise are well known and documented now as “identity theft.” These include regular stories published in the news and research journals documenting the loss of personally identifiable information by corporations and governments around the world. Current trends in social media and commerce, with voluntary disclosure of personal information, create other potential vulnerabilities for individuals participating heavily in the digital world. The Netflix Challenge in 2009 was launched with the goal of creating better customer pick prediction algorithms for the movie service . An unintended consequence of the Netflix Challenge was the discovery that it was possible to de-anonymize the entire contest data set with very little additional data. This de-anonymization led to a federal lawsuit and the cancellation of the sequel challenge . The purpose of this topic is to understand the national level vulnerabilities that may be exploited through the use of public data available in the open or for purchase.
Could a modestly funded group deliver nation-state type effects using only public data? The threat of active data spills and breaches of corporate and government information systems are being addressed by many private, commercial, and government organizations. The purpose of this research is to investigate data sources that are readily available for any individual to purchase, mine, and exploit. The marketing community uses large-scale data aggregators, big data analytics, and social science techniques to deliver highly targeted advertising campaigns. Does the availability of data for purchase or for free, advanced marketing techniques (e.g., collaborative filtering, computational advertising), and low-cost big data analytic capabilities (e.g., Amazon EC2) provide a determined adversary with the tools necessary to inflict nation-state level damage? To what extent could a non-state actor collect, process, and analyze a portfolio of purchased and open source data to reconstruct an organizational profile, fiscal vulnerabilities, location of physical assets, work force pattern-of-life, and other information , in order to construct a deliberate attack on a specific capability?
The goal of this topic is to develop tools to characterize and assess the nature, persistence, and quality of data. The tools should be based on principled scientific methods for sampling and relevant statistical methods for assessment. Also of interest are tools to characterize the quality of data for automated processing and analysis (i.e., a measure of how much manpower would be required to use a specific source).
Additionally, the goal of this topic is to characterize the threat through the creation of tools, techniques, and methodologies to measure the vulnerabilities in a given set of public data. As an example, reconstructing the profile of an organization from many data pieces using low computational-complexity methods might indicate vulnerability. Also of interest to this topic is the development of sensors, tools, and techniques necessary to defend against the malicious use of data for purchase. Throughout the performance of this research (Phases I, II, and III), there will be no indefinite collection or storage of data sources containing personal identifying information (PII). Develop a proof-of-concept system that can automatically sample data from numerous sources, characterize the data, and provide automatic feedback on the measurable risk inherent with various collections of data. Develop methodology for risk assessment and mitigation through reallocation of resources.
Investigate the landscape of public data both open and purchasable across several domains (e.g., GIS, webpages, consumer data, social media, etc.), through statistical data characterization and assessment. Develop a set of risk factors for vulnerability including complexity of the computation for compromise, and design a prototype tool set necessary to automatically measure the risk inherent in the data. Develop a plan for detailed implementation of methods in PHASE II and III, including a data privacy plan.
For more information, including PHASE II and III details, visit www.zyn.com/sbir/sbres/sbir/dod/darpa/darpasb133-002.htm .
1) Bell, Robert, Yehuda Koren, Chris Volinsky. 2009. “The BellKor 2008 Solution to the Netflix Prize,” 2009, AT&T Labs, Yahoo! Research, Florham Park, NJ.
2) Netflix Prize Update, March 12, 2010,
3) Gorman, Sean. 2004. “Project Mayhem: Physical Vulnerability Exploitation Targeting the US Financial Sector, George Mason University School of Public Policy, 2004.
KEYWORDS: public data, national vulnerability, data science, anonymization