Interview: Amit Sheth, Kno.e.sis on Deriving Actionable Insights from Social Data
We discuss Twitris—a tool for collective social intelligence, challenges in using social data to get actionable insights during emergency situations, managing Data Variety, and entrepreneurship.
Amit P. Sheth is an educator, researcher, and entrepreneur. He is the LexisNexis Eminent Scholar and founder/executive director of the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) at Wright State University. Kno.e.sis conducts research in social/sensor/ semantic data and Web 3.0 with real-world applications and multidisciplinary solutions for translational research, healthcare and life sciences, cognitive science, and others.
He is among well cited authors in Computer Science, World Wide Web, and databases. His research has led to several commercial products, many real-world applications, and three successful startups. One of these was Taalee/Voquette/ Semagix, which was likely the first company (founded in 1999) that developed Semantic Web enabled search, analysis and applications.
First part of interview.
Here is second part of my interview with him:
Anmol Rajpurohit: Q5. Your research team recently built Twitris—a system for collective social intelligence. What are the key capabilities of Twitris? What is your favorite Twitris use case story?
Amit Sheth: Social media monitoring and analysis is a very active, noisy area for both research and commercial activities. Twitris probably has the most advanced technical capability to analyze social media along with relevant open (linked, structured) data along multitude of dimensions, including spatio-temporal-thematic, people-content-network, and sentiment-emotion-subjectivity. It does real-time, highly-scalable (it is hosted on our cloud with 864 cores, 17TB main memory, and 435TB of storage) semantic processing that utilizes extensive background knowledge (domain specific models and knowledge) for challenging disambiguation problems (Turkey as in bird or country?), personalization, and contextualization.
However, above all these features, it is the ability to quickly utilize components of Twitris to provide actionable information for situations such as major disasters or crises that make it all the more unique. It has been used during a variety of disasters (see coverage in some of the top media outlets). Its real-world use during Jammu-Kashmir Floods resulting in saving lives and reducing suffering is my most favorite. Hemant Purohit, one of my PhD students along with a couple of his Twitris team members quickly isolated Twitris component to develop a tool that allowed a number of international digital volunteers to quickly filter a stream of specific requests for rescue in addition to information related to evacuated zones and rescued people; it also enabled the volunteers to redirect rescue calls to the Indian Army’s coordination office, leading to actual rescues (sample coverage; tool snapshots here).
AR: Q6. What are the most underrated challenges of performing real-time analytics on social data (from Facebook, Twitter, etc.) to provide actionable insights for emergency response?
AS: Your choice of the term “actionable insight” is significant, as monitoring or analysis is much less valuable compared to the ability to take action and affect outcomes in crises and other situations demanding real-time processing.
Among the challenges are disambiguation, sparsity of data with actionable intentions, language usage unique to a situation (region, topic, demographic, etc.), veracity of the information (source credibility, situational update, etc.), and the need to use local information that is relevant to an emergency (e.g., existing/emerging regional and community organizations, local/regional public health and government institutions with the capacity to participate in response, local geographic and transportation network knowledge).
Additionally, with respect to some of the issues of disambiguation, when analyzing tweets on cannabis and synthetic cannabinoids, which routinely exceed provides over a million tweets a day, using the term “spice” to filter for a synthetic drug, results containing or referring to a “pumpkin spice latte” must be avoided. Also, with respect to issues of data sparsity and actionable information, when we analyzed two million tweets related to an Oklahoma tornado in search of help and offering help, only 1.2% and 0.02% of the tweets were found to belong to these categories, respectively.
AR: Q7. One of the biggest challenges of Big Data is Variety. What are your thoughts on the current state of semantic integration of multimodal and multi-sensory data? Where do you see it headed in future?
AS: This is an important and exciting topic of research which has not received much attention.
Humans can effectively and seamlessly consume multimodal/multisensory data from diverse sources regarding an event, but much of the information processing techniques focus on single modality. Researchers specialize in text processing, image processing, audio processing, video processing, and so on, but we need to be able to simultaneously and concurrently process related content, whatever the modality or media.
Semantics is key to dealing with variety, and earlier in systems such as VisualHarness and, more recently, in our Semantic Sensor Web project we have addressed issues such as heterogeneous sensor data fusion and real-time analysis. By adding our recent work on semantic perception, we are now working on what I term perceptual computing. I will soon discuss my thoughts on semantic, cognitive, and perceptual computing in an article.
AR: Q8. Besides being an educator and researcher, you are also an entrepreneur. What is the toughest part of being an entrepreneur? What is the most rewarding part?
AS: Being an entrepreneur involves people, business and technical skills. For me, technical, marketing, financial issues were less of an issue, but as a CEO, you are also the company’s chief spokesperson and chief salesman. For me, sales was the toughest part, primarily because I did not like the process of going back to a potential customer. That said, I found quite a few rewarding outcomes: Taalee (before it merged to become Voquette, later Semagix, then acquired to become Fortent and now Actimize) grew to 30+ employees, spent 7 million in local payroll in Athens, GA which had no previous high-tech startup, all its employees were retained during the merger, whereas most of the employees of the other company were let go.
Even today, the Know Your Customer technology which we developed is deployed at some of the world’s largest banks. Taalee was awarded the first patent on semantic search, browsing, personalization, and advertising, and also developed first commercial products/applications on these topics. More recent use of a very similar background knowledge enabled semantic search, as I reviewed recently, makes me particularly proud as it validates what we pioneered nearly 15 years ago.
Third part of the interview.
- Amit Sheth on Deriving Actionable Insights from Social Data
- Amit Sheth on Designing Academic Curriculum for Data Science
- Toni Jones, U-Haul on Deriving Business Insights from Social Media
- Scale sensitive data science and analytics with confidence
- Statistical Thinking for Industrial Problem Solving: a free online course
- Statistical Thinking for Industrial Problem Solving: a free online course
- Statistical Thinking for Industrial Problem Solving: a free online course.
- The Best ETL Tools in 2021
- Six Ways For Data Scientists to Succeed at a Startup