Interview: Amit Sheth, Kno.e.sis on Deriving Value from Big Data through Smart Data

We discuss the definition of Smart Data, how to derive Smart Data from Big Data, maturity assessment for Smart Data pursuit, computing for human experience and Kno.e.sis.

amit-shethAmit P. Sheth is an educator, researcher, and entrepreneur. He is the LexisNexis Eminent Scholar and founder/executive director of the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) at Wright State University. Kno.e.sis conducts research in social/sensor/ semantic data and Web 3.0 with real-world applications and multidisciplinary solutions for translational research, healthcare and life sciences, cognitive science, and others.

He is among well cited authors in Computer Science, World Wide Web, and databases. His research has led to several commercial products, many real-world applications, and three successful startups. One of these was Taalee/Voquette/ Semagix, which was likely the first company (founded in 1999) that developed Semantic Web enabled search, analysis and applications.

Here is my interview with him:

Anmol Rajpurohit: Q1. How do you define "Smart Data"? How would you describe the process of deriving Smart Data from Big Data?

Amit Sheth: In my first use of the term “Smart Data” in 2004 I described it as “realizing productivity, efficiency, and effectiveness gains by using semantics to transform raw data into Smart Data.” More recently, in my 2013 retake on Smart Data, given the interest in Big Data, I describe it smart-dataas what makes sense out of Big Data. That is,

Smart Data provides value from harnessing the challenges posed by volume, velocity, variety, and veracity of big data, and in turn providing actionable information and improving decision making.  It is about extracting value by improving human involvement in data creation, processing, and consumption, resulting in enhanced human experience.

 Better and timely decisions are enabled by contextualized and personalized transformation of (raw, multimodal, big) data into situational awareness and actionable information using (background) knowledge.

AR: Q2. Where are we currently in our pursuit of Smart Data? What are some major milestones that we have achieved recently?

AS: We are relatively early in our pursuit of Smart Data. There is too much attention on Big Data, a term that focuses on the problems or challenges of the Vs (volume, variety, velocity and veracity), rather than Smart Data, a term that focuses on big-data-vs-smart-datavalue and implies solutions we (as humans, and applications serving humans) seek from all the data related challenges.  Transforming data into actions, through contextual and personalized processing of data, applying both top brain (e.g., planning) and bottom brain (e.g., classification, interpretation, perception) processes is something we have started to appreciate. Further, all these need to be done with very diverse (multimodal, multisensory, a variety of resolutions, uncertainty, etc.) data, as human brains are capable of processing, but with much larger volume and velocity of data.

One of the milestone we have achieved relates to what we call semantic perception, which in essence  converts diverse data within the context of a domain specific knowledge into higher levels of abstractions relevant to human actions and decision making (a short video describing the idea; a talk and slides discussing Smart Data in more details using the example of one specific dHealth/mHealth application on Asthma control from the kHealth project at Kno.e.sis). 

At the same time, there are many complementary and exciting developments in areas such as human-centered AI, including brain-inspired computing, which are relevant to the technologies I expect to enable Smart Data.

AR: Q3. What are the advantages of viewing Big Data collectively across Physical, Cyber and Social (PCS) ecosystems? How does that lead us to "Computing for Human Experience"?

AS: For an increasing number of applications that can lead to improvement in quality of human life and experiences, we need to utilize data from the physical world, such as observations recorded by devices/sensors/Internet of Things, the cyber world (all the data/facts and knowledge residing on the Web), and the social sphere (all relevant conversations, opinions, experiences shared over the social web).  For example, in the case of asthma, we use data from the physical world obtained from sensors detecting nitric oxide, carbon monoxide, indoor temperature, dust, humidity, wheezing, outdoor pollen and smog, etc. From the cyber world we utilize  a conceptual model for how physicians classify levels of asthma control, historical information, information collected from health organizations (e.g., number of asthma related admissions and deviations from the norm), and from the social world  we utilize data such as regional and local tweets on asthma and related symptom.
Some of the data arrives continuously (few observations every second or minute), while other arrives less frequently (such as a daily report from health facilities or wheezing information provided periodically by a patient). Notice that different patients will be susceptible to different conditions, and typically none of the data in isolation would make any sense to a patient or even doctor. What a human (patient, doctor, epidemiologist, or public health worker) really needs is actionable information— different in each cases. For example, a patient would like to know what he/she can do to avoid an asthma episode. When is risk high enough to take a preventive action (such as using inhaler), and how can this be done while also avoiding overuse? We capture these in terms as abstractions such as a risk score and vulnerability score; these score are examples of Smart Data, and result in patients seeking medical attention or carrying out physician prescribed procedures. This form of Smart Data and actionable information are artifacts of computing that lead to an improvement, in this case, of the quality of life and more generally, an improvement in overall human experience.

AR: Q4. How and when did you get the inspiration to start the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) at Wright State University? From the perspective of academic research as well as market-ready innovative solutions, where would you like to see Kno.e.sis in next 5 years?

AS: When I was at University of Georgia (UGA), I had a very successful lab named Large Scale Distributed Information Systems lab. While at UGA, I also did two start-ups and engaged in several projects in multidisciplinary research.  As a knoesisLexisNexis Ohio Eminent Scholar with an excellent commitment in terms of endowment, space, and overall charge, I knew that I could do something bigger and achieve a larger real-world impact through technology development and transfer, commercialization, as well as social and human development. I chose to take a highly multidisciplinary route.

Besides computer scientists (covering distributed systems/cloud computing, semantic/social/sensor Webs, AI/KR/ML/NLP, data mining and IR), we have an extensive amount of involvement with researchers and practitioners (both as faculty members and collaborators) in biomedicine, bioinformatics, clinical research and practice, cognitive science,  and humanitarians.  What we do is perform high quality computer science and interdisciplinary research while addressing consequential issues with potentially high impact such as:
  • Can we predict health changes/deterioration (e.g. asthma attack) for intervention before an episode occurs in order to prevent it altogether?
  • Can we make computers recognize implicit information present in medical records in order to provide a comprehensive data set to the models that attempt to understand patient health status?
  • Can we automatically extract highly sparse actionable intent of emerging resource needs and availability, as well as match them during dynamic disaster response, which aids situational awareness & coordination?
  • Can we mine societal beliefs, lack of services and laws from social media conversations on gender-based violence to assist regional policy makers?
  • Can we empower city authorities to make policy decisions by predicting impending city issues such as traffic, crime, and pollution?

One objective of our research remains constant: continue to achieve exceptional outcomes for our students.

Second part of the interview.