KDnuggets Home » News » 2014 » Jul » Opinions, Interviews, Reports » Interview: Kavita Ganesan, FindiLike on Building Decision Support Systems based on User Opinions ( 16:n16 )

Interview: Kavita Ganesan, FindiLike on Building Decision Support Systems based on User Opinions

We discuss the founding story of FindiLike, Opinion-driven Decision Support Systems (ODSS), challenges in analyzing user opinions, future of Sentiment Analysis, favorite books and more.

Kavita GanesanDr. Kavita Ganesan is the Founder of FindiLike a company that provides technologies for summarization of opinions, crawling of entity-specific reviews and enabling opinion-driven search. She has over 10 years of experience in research and development of intelligent information systems, particularly in the domain of text information management and analysis.

She received her Ph.D. from the University of Illinois at Urbana-Champaign where she finished a dissertation on opinion-driven decision support system proposing a suite of novel and highly general algorithms for online review crawling, abstractive and concise opinion summarization, and opinion-based entity ranking. She is very passionate about using research in practice and with that focuses on developing techniques that are general and scalable.

Here is my interview with her:

Anmol Rajpurohit: Q1. What inspired you to launch FindiLike? When was the first time that you thought about it? How did your recent PhD contribute to it?

FindiLikeKavita Ganesan: From the time I joined my PhD program, I have been very passionate about developing novel algorithms to solve interesting real world problems. I have always wanted research to be "usable" rather than just be on paper. This is what got me into launching FindiLike - to turn research into usable technology solving real world problems.

AR: Q2. How would you define an Opinion-driven Decision Support System (ODSS)? What are the kind of research problems that ODSS encompasses?

KG: An Opinion-driven Decision Support System is basically a platform consisting of tools and technologies that would facilitate users and businesses to leverage opinions more efficiently for all sorts of decision making tasks. For example, for a user, this can be a decision making task on which product to purchase based on all available opinions. And for a business, this can be what problems of their very own product to fix based on opinions of other users. To facilitate such a platform there is actually a multitude of interesting research problems ranging from data mining problems to human computer interaction problems. Example of research problems:

Opinion Summarization

One of the easiest ways to analyze the abundance of unstructured opinions is through the summarization of all these opinions. There is a whole range of methods to actually summarize opinions with each method having its pros and cons. A lot of details on the different methods for structured summarization (for eg, opinion through rating scales) can be found in the survey paper by Kim et. al 2011. Then you also have unstructured summaries where these are basically textual summaries, trying to summarize the key opinions in text. In recent years, researchers have actually been looking into abstractive micro-summarization (micropinion) format rather than sentence extraction methods, where you try to generate concise, abstractive and readable summaries on key opinions. The reason for this is because full sentences can become verbose and may not be suitable for hand-held devices. The example below shows what a micropinion summary looks like when run on reviews of Acura 2007. This was run using a variant algorithm based on several research projects: Micropinion-generation and Opinosis. Micropinion Summary
Micropinion summary generated on Acura 2007 reviews

More examples: Opinion-Driven Search, Opinion Acquisition (OpinoFetch)

AR: Q3. What are the biggest challenges in mining the opinions scattered all across the web (including social media) and making sense out of it?

KG: Based on my experience, the biggest challenge with all these scattered opinions is noise and duplicates. Since opinions can be highly redundant, we have the benefit of volume to actually surface important opinions for analysis. Along with this, we often times would have “noise” and duplicates Opinion-Analysis-Noisethat can be highly distracting and can throw algorithms and crawlers off-track. For example, because TripAdvisor allows other sites to use their review APIs, a crawler may regard the reviews from the TripAdvisor site and the site that “borrowed” TripAdvisor reviews as two separate sources of reviews when technically they are the same reviews. In addition, if we consider social media content, not all content contain opinions. Some posts are links to articles or videos. Some are just stating what people are currently doing and some posts have a mix of opinions and links. Thus, the irrelevant content is often the “noise” and it is very important to offset this noise or be robust to such noise. If we start building applications and analysis around the “noise” and the duplicates, then we would end up with false analysis or be frustrated with all the distracting content which surfaces instead of the desired content.

AR: Q4. How would you differentiate FindiLike from the other opinion mining and sentiment analysis tools?

KG: Unlike typical sentiment analysis tools that tag text to contain positive or negative sentiments or full-scale market research type of sentiment analysis applications, the goal of FindiLike is to provide pre-requisite Sentiment Analysistools needed in order for any type of opinion-driven analysis to happen. For example, we provide review feeds to companies for their own sentiment analysis tasks.  We also provide API tools so summarize reviews, tweets and opinions in general so that users and businesses can understand what people are actually saying within such content. We also have a framework that would facilitate opinion-driven search where the user can provide specific opinion requirements, and the framework would actually rank the entities of interest based on how well these opinion requirements are matched. For example, when looking for a laptop, a user may want a laptop that is said to be “lightweight” and has “bright screen”. FindiLike uses extensions of state-of-the-art research methods to achieve this general goal.

AR: Q5. What do you personally think about the future of Sentiment Analysis? Your predictions?

Sentiment analysis would go beyond just the “positive” or “negative” that it is currently thought to be. What is going to really matter eventually is what actually people have said which provides a lot more information and insights rather than if something was positive or negative.

For example, "iPhone 5s design: positive" is not as informative as “the iphone 5s is sleek, fits easily in the pocket and has a beautiful interface”.

Also, in the industry, the current focus of sentiment analysis is primarily restricted to opinions within social media content. However, opinions are far more ubiquitous. You have an abundance of opinions in the form of user reviews which actually contain a lot of details, then you have opinions within user comments (e.g. comments on articles, videos, etc.), you also have opinions within forums. So the value would soon come from all these other sources and not just social media content.

AR: Q6. What is the best advice you have got in your career?

KG: Be brave, take risks! You never know where a new adventure would take you. It may take you to a better place than you envisioned or to a place that you have always dreamt about.

AR: Q7. What are your favorite books or blogs on Data Science?

IR BookKG: Well my favorite book on IR is nlp.stanford.edu/IR-book/, it starts from the very basic and has a lot of information right from inverted indexing to crawling. It is usually my go-to book for basics in IR and text mining. I also enjoy lectures by Andrew Ng on machine learning - he does a great job in explaining obscure concepts. I have also been following Hal Daume’s blog (nlpers.blogspot.com/). He explains specific Machine Learning + NLP related topics in an intuitive manner which is great when you need more of a high-level understanding before you dive into the details or just to get some ideas for your own work.