ASE International Conference on Big Data Science 2014: Day 1 Highlights

Highlights from the presentations by Data Science leaders from Pivotal, IBM Research, George Washington University, IARPA at ASE Conference on Big Data Science 2014 held in Stanford University.

The Second ASE International Conference on Big Data Science was a great opportunity for students, data scientists, engineers, data analysts, and marketing professionals ASEto learn more about the applications of Big Data. Session topics included “Enabling Science from Big Image Data,” “Engineering Cyber Security and Resilience,” “Cloud Forensics,” and “Exploiting Big Data in Commerce and Finance.”

Held at the Tresidder Memorial Union at Stanford University, the ASE International Conference on Big Data Science took place from Tuesday, May 27 – Friday, May 31, 2014.

Highlights from workshops.

Here are highlights from Day 1 (Wednesday, May 28, 2014):

Milind BhandarkarDr. Milind Bhandarkar, Chief Scientist, Pivotal delivered a talk titled “The Future of Data Intensive Applications.” He mentioned that although “Big Data” is a much hyped term nowadays in Business Analytics, the core concept of collaborative environments conducting experiments over large shared data repositories has existed for decades. He suggested the audience to go through The Wall Street Journal article titled “Why Software Is Eating The World” by Marc Andereessen. He explained the big gap through the following facts for an average enterprise:
  • 70% of data generated by customer
  •  80% of data being stored
  •  3% being prepared for analysis
  • 0.5% being analyzed
  • <0.5% being operationalized

For example, in healthcare about 40,000 diverse studies were performed in last five year. However, very few models got operational. Modernization of IT infrastructure is very much needed to get models operational. Building blocks of modern data architecture are: Applications, Analytics, Data and Speed. He discussed Data Fabric architecture briefly. He mentioned that “Infrastructure-As-A-Service is the Hardware”. He talked also about Cloud Foundry, an application environment. He discussed the idea of application as unit of deployment. Cloud Foundry Talking about Hadoop, he said that it has tremendously changed economics of data storage and analysis. He suggested the audience to start preparing for convergence of High Performance Computing, Big Data and Databases with new hardware platforms such as Mellanox, RoCE, ARM, etc. now available in the market.

Michelle ZhouDr. Michelle Zhou, Senior Research Manager, IBM Research-Almaden talked about “System U: Computational Discovery of Personality Traits from Social Media to Deliver Hyper-Personalized Experience”. She talked about individualization at scale. She shared that psycholinguistic studies have shown that the words people use reflect their personality. Hundreds of millions of people leave text footprints in public. System U uses psycholinguistic analytics to automatically derive one’s personality traits from their digital footprints. These traits uniquely characterize an individual’s psychological, cognitive, and affective style and properties, and can then be used to make hyper-personalized recommendations to the individual and influence/intervene the actions of the individual.

She gave an overview of System U and described how it automatically derives several types of personality traits from one’s tweets, including human basic value (one’s belief + motives) and fundamental needs (e.g., ideals vs. practical). In addition, she also presented a set of validation studies that assess how accurate the System U-derived traits are compared to “ground truth” and how these derived traits influence recommendations and people’s behavior in the real world. She used live demos and concrete examples, ranging from precision marketing to individualized customer care, to demonstrate the applications of System U and discussed research directions in this space.

Carl LanwehrDr. Carl Landwehr, Lead Research Scientist, George Washington University gave an interesting talk titled “A Building Code for Building Code”. He introduced the metaphor of a building code to talk about building better software code. Although cyberspace has a physical reality of computers and communication channels, sensors and actuators; it is really made mostly by the programs that control those things.

Today, systems of programs control most of our critical infrastructures. Workers in cyber security have adopted many rich metaphors: Trojan Horse, virus, worm, firewall, and more. Difficulties arise when the metaphor blinds us to the underlying reality. He critically examined several common cyber security metaphors and proposed that the adoption of a new (or at least underutilized) one, that of a building code for critical infrastructure software, as a means of putting what we have learned in forty years of system development experience into practice.

Jill CrismanDr. Jill D. Crisman, Program Manager, The Intelligence Advanced Research Projects Activity (IARPA) delivered a speech on “Big Data in the Finder and Aladdin Video Programs”. She mentioned that Incisive Analysis Office at IARPA sponsors programs that help analysts make sense of massive data. Her talk focused on two such programs, Finder and Aladdin Video.

Finder is developing technologies that can locate where in the world a query image or video was taken based on the query’s content alone. The Automated Low-Level Analysis and Description of Diverse Intelligence Video (ALADDIN) Video program is developing technologies that can quickly search massive video collections for a user’s events-of-interest. The talk provided a brief overview of the goals and objectives of these programs, examined current results, and illustrated the size of the data involved.

Highlights from day 2.