ASE International Conference on Big Data Science 2014: Day 3 Highlights

Highlights from the presentations by Data Science leaders from UC Davis, UT Dallas, Northrop Grumman Corp and NIST on day 3 of ASE Conference on Big Data Science 2014 held in Stanford University.

The Second ASE International Conference on Big Data Science was a great opportunity for students, data scientists, ASEengineers, data analysts, and marketing professionals to learn more about the applications of Big Data. Session topics included “Enabling Science from Big Image Data,” “Engineering Cyber Security and Resilience,” “Cloud Forensics,” and “Exploiting Big Data in Commerce and Finance.”

Held at the Tresidder Memorial Union at Stanford University, the ASE International Conference on Big Data Science took place from Tuesday, May 27 – Friday, May 31, 2014.

Highlights from workshops.

Highlights from Day 1.

Highlights from Day 2.

Here are highlights from Day 3 (Friday, May 30, 2014):

Felix WuDr. S. Felix Wu, Professor, University of California-Davis started the day with an interesting talk: “On Content, Discussions, Opinions, and Deliberative Participation over Social Media Systems”. He emphasized that social media is changing many different aspects of our lives. By participating in online discussions, people exchange opinions on various topics or contents, shape their stances, and gradually build their own characteristics.

He also presented and discussed a framework for identifying online user characteristics and understanding the formation of user deliberation and bias in online newsgroups. Under the (Social Interactive Networks: Conversation Entropy Ranking Engine), his students have designed a dynamic user like graph model to recognize user deliberation and bias automatically in online newsgroups. They even evaluated identification results with linguistic features and implemented this model under SINCERE as a real-time service. By applying this model to large online newsgroups, he studied the influence of early discussion context on the formation of user characteristics.

He concluded that the formation of user deliberation and bias is a product of situations, not simply dispositions: confronting disagreement in unfamiliar circumstances promotes more consideration of different opinions, while recurring conflict in familiar circumstances evokes close-minded behavior and bias. Based on this observation, he leveraged a supervised learning model to predict user deliberation and bias at their early life-stage. His results show that knowing only the first three months of users’ interaction data generates an F1 accuracy level of around 70% in predicting user deliberation and bias in online newsgroups.

Bhavani ThuraisinghamDr. Bhavani Thuraisingham, Professor and Executive Director of the Cyber Security Research and Education Institute, The University of Texas at Dallas talked about “Cloud-Centric Assured Information Sharing”. She described her research and development efforts in assured cloud computing for the Air Force Office of Scientific Research. She, along with her team, has developed a secure cloud computing framework as well as multiple secure cloud query processing systems. Their framework uses Hadoop to store and retrieve large numbers of Resource Description Framework (RDF) triples (a subject, a predicate, and an object) by exploiting the cloud computing paradigm and they have developed a scheme to store RDF data in a Hadoop Distributed File System.

They implemented XACML-based policy management and integrated it with their query processing strategies. For secure query processing with relational data they utilized the HIVE framework. More recently they have developed strategies for secure storage and query processing in a hybrid cloud. In particular, they have developed algorithms for query processing wherein user’s local computing capability is exploited alongside public cloud services to deliver an efficient and secure data management solution. They have also developed techniques for secure virtualization using the XEN hypervisor to host their cloud data managers as well as an RDF-based policy engine hosted on their cloud computing framework.

Ketty GannDr. Ketty Gann, Senior Research Engineer, Northrop Grumman Corporation delivered a talk on “Twitter Analytics for Insider Trading Fraud Detection System”. She mentioned that Twitter analytics has been developed to process Twitter data at macro level for use in an insider trading detection system in order to establish normal trading patterns between daily stock price change and public sentiment.

Two machine learning models, Support Vector Machine (SVM) and Decision Tree, are built based on annotated historical Twitter data and Stanford Sentiment140 Tweet corpus, respectively. Her research focuses on the discussions of polarized sentiment (positive and negative), comparison of SVM and Decision Tree models, Sentiment Key Performance Index (SKPI) and Daily Sentiment Index (DSI) and mood analysis. The results illustrated that Twitter SKPI and DSI are useful indexes to predict the future stock price movement on regular stock trading.

Ram D. SriramDr. Ram D. Sriram, Chief, Software and Systems Division, National Institute of Standards and Technology delivered talk titled “Big Data and Semantic Web Meet Applied Ontology”. He said “since the beginnings of the Semantic Web, ontologies have played key roles in the design and deployment of new semantic technologies.” Yet over the years, the level of collaboration between the Semantic Web and Applied Ontology communities has been much less than expected. Within Big Data applications, ontologies appear to have had little impact. On the one hand, the Semantic Web, Linked Data, and Big Data communities can bring a wide array of real problems (such as performance and scalability challenges and the variety problem in Big Data) and technologies (automated reasoning tools) that can make use of ontologies.

On the other hand, the Applied Ontology community can bring a large body of common reusable content (ontologies) and ontological analysis techniques. Identifying and overcoming ontology engineering bottlenecks is critical for all communities. The primary goal of the Ontology Summit 2014 – the 9th in a series – was to provide a platform and opportunity for building bridges between the Semantic Web, Linked Data, Big Data, and Applied Ontology communities. The Summit activities brought together insights and methods from these different communities, synthesize new insights, and disseminate knowledge across field boundaries.

Highlights from day 4.