Signi-Trend App: Detecting Significant Trends in Text

Signi-Trend is a visual explorer tool for a new, heavy-hitters style, trend detection algorithm. Details will be published at KDD 2014.

By Erich Schubert, May 20, 2014.

Signi-Trend Signi-Trend is a visual explorer tool for our new trend detection method on four data sets.

The algorithm is heavy-hitters style; but we additionally track variance and this way can evaluate how significant a trend is; instead of reporting the raw counts.

Our two main contributions are:
  • how to evaluate trend significance
  • how to scale this approach to Twitter scale by hashing.

Details on the method will be published at KDD 2014, August 24-27.

The web site is only a data explorer; it is not live running the actual method. We have plans to make a live version available eventually - it will benefit from future work and research on e.g. spam filtering and online classification of trends into e.g. teenie idols.

Here is the post I sent to the "+Data Data Data" community at Google+:

At KDD 2014 (August 24-27), we will present a heavy-hitter style algorithm to detect significant trends in a textual data stream. Scalability is excellent due to the use of hashing; and due to the clever use of statistics we can easily identify those trends, who exhibit significant growth, instead of simply reporting the top-10 results as done by traditional heavy hitter algorithms. Results have been very exciting - we made a web app to allow exploring the data sets, and put it online for you to explore the data yourself. For details on the method, you'll have to come to KDD, though!

As this is my first work in text mining, I'm interested in your input. We tried to get an overview (a reviewer noted "good survey") but I'm sure we are still missing some nice related approaches.

We do have plans to make a live version available eventually. It will be interesting to attach this to a large web crawler, and watch for live trends in the crawled information; as well as adding online near-duplicate detection (for better spam filtering), online stopword learning, and of course sentiment analysis and classification... exciting times!

This is joint work with +Michael Weiler and Prof. Hans-Peter Kriegel at LMU München.

Dr. Erich Schubert
Researcher Lehr- und Forschungseinheit Datenbanksysteme
Institut für Informatik
Ludwig-Maximilians-Universität München