Prismatic Interest Graph [API]: Organize and Recommend Content

Prismatic Interest Graph API provides a set of tools for automatically analyzing unstructured text and annotating it with a variety of tags that are useful for organizing and recommending content.

By Dave Golland (Prismatic).

In the age of big data, many companies have accumulated massive, natural language text collections. These companies are responsible for organizing and recommending content to their readers, but they are having trouble keeping up with the rate at which it’s produced. The problem is that people don’t have time to manually read through all the content, and unstructured text doesn’t directly lend itself to analysis via automatic means.

At Prismatic, we crawl the web and index all the freshest, high-quality content in order to generate recommendations that people want to read. We have put a lot of thought into how to provide our users with the most relevant recommendations. To this end, we’ve engineered the Interest Graph -- a collection of tools for understanding our users’ interests, content, and the connections between them. The Interest Graph can automatically augment unstructured text with a variety of meaningful annotations that are easily interpretable by both people and machines, enabling high-quality recommendations of products and content. We have published a subset of the Interest Graph API, so now everyone has access to our text analysis tools.

Prismatic Interest Graph The first tool we’ve released from the Interest Graph can tag a piece of content with topics. Topics are single-phrase summaries of the thematic content of a piece of text; examples include Functional Programming, Celebrity Gossip, or Flowers. By surveying a variety of sources, we have produced a comprehensive list of topics that align with people’s interests.

Topic tags lend themselves to a variety of applications for organizing and recommending content based on semantic substance. They provide an organization scheme that readers can understand to find content interesting to them. Topics also provide a computationally-friendly structure for measuring similarity -- two pieces of content are similar if they have overlapping topics. You can use topics to automatically detect related advertisements, products, or articles.

To supplement the topic tagger, we’ve also released the Interest Graph’s similar topics tool, which finds the topics most related to a given query topic. Finding similar topics can be used to enhance the applicability of topics. For example, articles tagged with different topics might appear unrelated. However, by first expanding to include similar topics, connections between articles which might otherwise appear dissimilar can be detected.

Building a model of user preferences is yet another powerful application of topics. At Prismatic, we model our users’ interests by analyzing how they interact with content. To get a sense of how content interactions reflect a user’s interests, take a look at the figure to see some of the topics we automatically extracted from the links shared on a few celebrities’ Twitter accounts.

Prismatic Topics: Barack Obama, Arnold Schwarzenegger, Neil DeGrasse Tyson

Although tagging text with topics can provide valuable insights, topics capture only a single dimension of content: what it is about. Text is complex and rich; there are many other useful aspects along which it can be classified. For example, these aspects can be structural (listicle, recipe), functional (product review, event description), or suppressive (spam, NSFW). The number of different aspects is broad and diverse -- the Interest Graph infrastructure can scale to accommodate them all.

With the initial release of the Interest Graph, you can now register for an API token to start tagging, organizing, and recommending your content. Stay tuned for future releases of other tools in the Interest Graph for automatically analyzing other aspects of a piece of content.

Dave Golland Bio: Dave Golland is a Research Engineer at Prismatic. He got his PhD in Natural Language Processing from UC Berkeley.