How Shutterstock used Deep Learning to change the language of search

How Shutterstock created computer-vision and Deep Learning technology that understands their 70 million-plus images and takes away the need for customers to type in descriptions and unreliable keywording. The technology relies on pixel data as its language of choice.

By Nathan Hurst, Shutterstock.

Whether we think about it or not, the words we type into search engines are a language of their own. We think of English as a single language, however if you really drill it down you can argue that we routinely use -- and respond to -- different forms of English in different capacities.

 When I speak to my one-and-a-half-year-old niece, for instance, I use a slight variation of English compared to what I rely on when I write articles or speak to friends. Similarly, the language we plug into to search engines is much simpler than what we would use in person to describe what we seek. For this reason, at Shutterstock we’ve begun to refer to a language we call ‘queryese’ and, to meet these needs, built machine-language translation for the language of search.

 It’s not as simple as guiding people to tell us precisely what they want. In many cases, Shutterstock’s customers like to surf through the site and sift through the collection looking for inspiration. If they don’t know exactly what they’re looking for, the machine is going to have just as much trouble anticipating it. Users will stumble across the image they want, often which doesn’t match the words they initially searched for. This is particularly true when it comes to searching for ideas, such as “love.”


Why is this flower melancholy?

 ‘Queryese’ tends to be a very basic language; think about what you typically put into a search engine and what you expect to get back out of it. With so little for it go to on, there’s bound to be a lack of nuance and some miscommunication between man and machine. We have collected and stored so much search and download data over Shutterstock’s 13 years that we knew there had to be a better way to solve this problem.

 The first step for the in house computer vision team was to train models to work with the sort of images that Shutterstock uses. These are quite different from typical personal photos -- models that performed well on, for example, the ILSVRC12 challenge set didn’t suit stock imagery. Because ‘isolated’ is such a common term included inside our site’s searches, we had to build a separate classifier for that term alone.

 We went through a continuous process of training models, measuring their quality and adapting for identified weaknesses and outliers. A single model might start to look promising after a week, but we made sure to run it for months before we were confident to release it.

 We performed inference on all images, including tracking new images constantly being added. We look at the pixels with GPUs and generate a fingerprint that captures the image structure. This required some engineering effort to manage the data sets, although not especially large scale.

 We feed these fingerprints into a huge nearest neighbor search index to find the closest examples in Shutterstock’s collection. We implemented our search directly on CUDA/GPU to minimize latency. Nearest neighbor is one of the remaining hard problems in computer science, though. There are no exact algorithms that perform well on average so we spent time profiling and optimizing to get millisecond latency results.

With all that nuance to language in mind, we recently introduced computer-vision technology that understands our 70 million-plus images. It takes away the need for customers to type in descriptions and removes the burden on the back-end on unreliable keywording. The technology relies on pixel data as its language of choice.

from Shutterstock Unveils Better, Faster, Stronger Search and Discovery Technology:

With reverse image search you can now use a photo or illustration to find other images with a similar look and feel. Simply drag an image into the search bar and you’ll get results based on pixel data instead of the standard keyword data. See a demo of reverse image search below.

It took us around a year to develop and tweak the technology before we were comfortable releasing it. We know it’s a long-term investment and ongoing process, and it will require more time and attention. But we needed to get it in front of customers to hear what they had to say.

“This was the best thing ever,” one customer told us after they used the new search feature for the first time. “Finally! This is exactly what I needed,” another shared. However, the one that resonated most with me was praise for “being so inventive and having this option available.”

We’ve learned a lot inside the first month since its release and have started to work on making improvements.

Bio: Nathan Hurst is a Distinguished Engineer at Shutterstock, a leading, global technology company providing high-quality licensed imagery and music to businesses, marketing agencies and media organizations. Shutterstock has created the largest and most vibrant two-sided marketplace for creative professionals to license content - including images, videos and music, as well as innovative tools that power the creative process.