Strata 2014 Santa Clara: Highlights of Day 3 (Feb 13)
Strata 2014 was a great conference, and here are key insights from some of the best sessions on day 3: Data Journalism, Analytics over Real-time Streaming Data, Facebook Graph Analysis with One Trillion Edges, Socializing Search by LinkedIn.
One of the biggest challenges at big conferences such as Strata 2014 is that there is so much happening quickly and simultaneously that it is almost impossible to catch all the action.
We help you by summarizing the key insights from some of the best and most popular sessions at the conference. These concise, takeaway-oriented summaries are designed for both – people who attended the conference but would like to re-visit the key sessions for a deeper understanding and people who could not attend the conference.
See also: Strata 2014 Santa Clara: Highlights from Day 2 (Feb 12)
Drew started with emphasizing that most of the real-life problems come from interdisciplinary areas and not just one specific area. Organized crime and corruption has global business of about 2-3 trillion US Dollars. Countries with very high amount of such activities are Russia, Montenegro, Kosovo, Eq. Guinea, North Korea, and Uzbekistan.When working with corrupt government official in anti-democratic and weak states becomes stronger, it threatens local and regional security.
Meanwhile, the proceeds from crime are eagerly sought by western banks, hedge funds and markets. He displayed details of a Tormex bank account as an example. Over half a billion US dollars were poured into this Latvian bank account of the phantom company in a period of less than two years.
He urged to the data science community to come forward and help them to stop this large-scale illegal activity by hack and track method. He concluded by saying that efficient investigative reporting is the result of cooperation between investigative journalists, programmers and others who want to use data to contribute to create a cleaner, fairer and more just global society.
Anand initiated the session referring to the retina in a human eye, which communicates with brain in real-time at 10 million bits per second. Real-time streaming analytics is not just about low latency queries over batch data.
He explained business transformation as a whole new domain of new possibilities and unexpected breakthroughs in operational efficiency. He classified business transformation use cases into three categories:
- Existential use cases
- Fraud analytics in CC company
- IT or other security systems
- Enhancement use cases
- Financial Trading: Risky/fraudulent trades
- Digital Advertising: Optimization
- Predictive vs. reactive maintenance
- Transformational use cases
- Retail: Detecting location and serving customer
- Insurance: Drone flight images for claim adjudication
- Agriculture: Satellite image analysis for optimal plot
- Healthcare: Complex models for disease detection in real-time
After providing an overview of the current business landscape, Anand and Pranoy presented real-time streaming analytics as an offering from Impetus, which would provide various features such as high-speed data ingestion, elastic scaling, variety in data parsing, pluggable persistence, real-time index and search, dynamic message routing and many more. They summarized their approach as the iterative cycle of:
Sense -> Analyze -> Act -> Sense
Avery explained his motivation behind graph analysis by showing images, recommendation and network graph. Graph analytics have applications beyond large web scale organizations. Many computing problems can be efficiently expressed and processed as a graph, which can lead to useful insights that drive product and business decisions.
He briefly explained techniques such as balanced propagation, super-step splitting and avoiding out of core. He concluded by stating the future research and development efforts should focus on evaluation of alternative computing models, performance, lowering the barrier to entry and applications.
by Sriram Sankar and Daniel Tunkelang, LinkedIn
Daniel started with providing an overview of how LinkedIn is addressing search quality issues through leveraging the economic graph. Social context means that the relevance of search results is highly personalized. He explained how machine learning ranks socially using model of tree with logistic regression leaves. Focusing on its customer base, LinkedIn is moving towards an entity-oriented search i.e. when searched for a term the results displayed should belong to all entities such as personal profile, company profile, employees of company, job openings, etc. He mentioned that query understanding acts as a relevance filter having phases such as segmentation, decoding, query rewrite resulting to new query. He also made an announcement that LinkedIn would soon have entity-driven search assistance feature.
Strata 2014 Santa Clara: Highlights from Day 2 (Feb 12).