ASE International Conference on Big Data Science 2014: Day 4 Highlights

Highlights from the presentations by Data Science leaders from UC Berkeley, Clark Atlanta Univ, Florida Institute of Technology, Rober Bosh LLC and HP on day 4 of ASE Conference on Big Data Science 2014, Stanford.

The Second ASE International Conference on Big Data Science was a great opportunity for students, data scientists, ASEengineers, data analysts, and marketing professionals to learn more about the applications of Big Data. Session topics included “Enabling Science from Big Image Data,” “Engineering Cyber Security and Resilience,” “Cloud Forensics,” and “Exploiting Big Data in Commerce and Finance.”

Held at the Tresidder Memorial Union at Stanford University, the ASE International Conference on Big Data Science took place from Tuesday, May 27 – Friday, May 31, 2014.

Highlights from workshops.

Highlights from Day 1.

Highlights from Day 2.

Highlights from Day 3.

Here are highlights from Day 4 (Saturday, May 31, 2014):

Michael MahoneyDr. Michael Mahoney, ICSI and UC Berkeley kicked off the last day of the conference by delivering a talk on “Randomized matrix algorithms and large-scale scientific data analysis”. He started with mentioning that randomization has proved to be a valuable resource for the development of better algorithms. Matrix problems are ubiquitous in many large-scale scientific data analysis applications. He discussed algorithms (in RAM) for least square and low rank approximation. He described the underlying theory and gap between randomized matrix algorithms of theoretical origins and practical applications. He also emphasized on the critical need of bridging the gap between the two. Randomized Matrix Algorithms Depending on the situation, better might mean faster in worst-case theory, faster in high-quality numerical implementation, e.g., in RAM or in parallel and distributed environments, or more useful for downstream domain scientists. He mentioned that although a lot of recent progress has been made on theory, implementation, and application of randomized matrix algorithms, there is still an immense need of a great model to bridge the gap between large-scale scientific data analysis and more general large-scale data analysis.

Zeynab Bahrami BidoniZeynab Bahrami Bidoni and Roy George, Department of Computer and Information Systems, Clark Atlanta University talked about “Discovering Community Structure in Dynamic Social Networks using the Correlation Density Rank”. Bidnoi mentioned that a “community” in context of social network is defined as a sub graph with a higher internal density and a lower crossing density with respect to other sub graphs.

Community detection is an important research issue in social network analysis (SNA), where the objective is to recognize related sets of members such that intra-community associations are denser than inter-communities associations. She introduced a novel and efficient distance based ranking algorithm, called the “Correlation Density Rank” (CDR), which is utilized to derive the community tree from the social network and to develop a tree learning algorithm that is employed to construct an evolving community tree. She also presented an evolution graph of the organizational structure, through which new insights into the dynamic network may be obtained. The experiments, conducted on datasets, both synthetic and real, demonstrated the feasibility and applicability of the framework.

Diego PachecoDiego Pacheco, Florida Institute of Technology delivered a speech on “Using Interactions in the Quantification of Media Bias”. He started with mentioning that media outlets portray themselves as “neutral” or “nonpartisan” but how can the bias - which really exists - be quantified? Interpretation of coverage of media outlets can lead to evaluation of media bias. However the evaluation is not as reliable as the ground truth because “true reports”, “false reports”, and “lack of reports” compose the media coverage. Media outlets are packed with what is called “spin" - a type of propaganda used to sway public opinion in favor or against an organization or public figure.

One clear example of favoritism happens in politics; the Pew Research Center has shown that different media outlets attract audiences with different political ideology, which in turn can put pressure on outlets to satisfy what they want to hear leading to spinning the news: a typical vicious cycle. He proposed a mechanism to quantify media bias based on the analysis of relationships between people or organizations in the real world.

He proposed a metric called “coverage” that indicates how much the media outlet can be trusted and then showed how the coverage can be applied to the case of party and individual favoritism. He also applied the proposed approach to the US Senate using collaborations between senators in bills' co-sponsorships as the ground truth; the assumption was simple, Senators working more should get more coverage on the media. Their results indicate that most media outlets favor the Democrats and only one favors the Republicans.

Rumi GhoshRumi Ghosh, Robert Bosch LLC and Bernardo Huberman, HP gave a talk titled “Information Relaxation is Ultra-diffusive”. They investigated how the overall response to a piece of information (a story or an article) evolves and relaxes as a function of time in social networks like Reddit, Digg and YouTube.

They found that the temporal evolution of popularity can be described by a universal function whose parameters depend upon the system under consideration. Whether it is the inter-arrival time between two consecutive votes on a story on Reddit or the comments on a video shared on YouTube, there is always a hierarchy of time scales in information propagation. The hierarchy of time scales led them to believe that the dynamical response of users to information is ultra-diffusive in nature. They showed that an ultra-diffusion based stochastic process can be used to rationalize the observed temporal evolution.