arXiv Paper Spotlight: Sampled Image Tagging and Retrieval Methods on User Generated Content

Image tagging with user generated content in the wild, without the use of curated image datasets? Read more about this paper and its promising research.

What is the feasibility of image tagging with user generated content in the wild?


A recent paper by Karl Ni (Lab41, In-Q-Tel), Kyle Zaragoza (Lab41, In-Q-Tel), Charles Foster (Stanford University), Carmen Carrano (Lawrence Livermore National Laboratory), Barry Chen (Lawrence Livermore National Laboratory), Yonas Tesfaye (Lab41, In-Q-Tel), and Alex Gude (Lab41, In-Q-Tel), titled "Sampled Image Tagging and Retrieval Methods on User Generated Content," attempts to address this issue.

The research starts with the premise that carefully-curated image datasets are not ideal for proposed automated approaches to tagging and retrieving images, due to limitations imposed by the number of keywords that can be used owing to small training label sets.

Extending curated datasets requires supervision, where developed algorithms would need to be tolerant of inevitable labeling errors. Conversely, open source imagery datasets from Google Photos or FlickR that are created with user generated content (UGC) have an almost unlimited number and variety of unique tags that cover much of the vocabulary of the English language.

After advocating the use of user-generated content, the paper acknowledges that this approach is problematic for a variety of reasons, including:

[T]he focus of our work is on the scale of the labels, which take the form of noisy metadata tags. The challenge then becomes negotiating matrix operations on any deep learning architecture that requires a final layer that is proportional to the number of words.

Skip image optimization

Proceeding using word embeddings to leverage unstructured text, the research employs tried and true deep neural network techniques at scale, with a focus on robustness related to the above potential problem vis-a-vis user-generated content. From the abstract:

Prior work on word embeddings successfully leveraged unstructured text with large vocabularies, and our proposed method seeks to apply similar cost functions to open source imagery. Specifically, we train a deep learning image tagging and retrieval system on large scale, user generated content (UGC) using sampling methods and joint optimization of word embeddings. By using the Yahoo! FlickR Creative Commons (YFCC100M) dataset, such an approach builds robustness to common unstructured data issues that include but are not limited to irrelevant tags, misspellings, multiple languages, polysemy, and tag imbalance.

Jointly optimized vector space embeddings

Implementation experimentation consisted of training, validating, and testing against proper splits from a single corpus. Generalization capabilities for content and word tag variety was assessed using cross-corpus evaluation, which entails training on one dataset and testing on a completely different dataset.

And what are the findings?

As a result, the final proposed algorithm will not only yield comparable results to state of the art in conventional image tagging, but will enable new capability to train algorithms on large, scale unstructured text in the YFCC100M dataset and outperform cited work in zero-shot capability.

Interesting research and results. The abstract for this paper can be found here, while the paper is located here.