KDnuggets Home » News » 2015 » Oct » Opinions, Interviews, Reports » An Inside View of Language Technologies at Google ( 15:n36 )

An Inside View of Language Technologies at Google


Learn about language technologies at Google, including projects, technologies, and philosophy, from an interview with a Googler.



Natural language processing, or NLP, is the machine handling of written and spoken human communications. Methods draw on linguistics and statistics, coupled with machine learning, to model language in the service of automation.

OK, that was a dry definition. Fact is, NLP is at, or near, the core of just about every information-intensive process out there. NLP powers search, virtual assistants, recommendations, and modern biomedical research, intelligence and investigations, and consumer insights. (See my 2013 article, All About Natural Language Processing.)

No organization is more heavily invested in NLP -- or investing more heavily -- than Google. That's why a keynote on "Language Technologies at Google," presented by Google Research's Enrique Alfonseca, was a natural for the up-coming LT-Accelerate conference, which I organize. (LT-Accelerate takes place 23-24 November in Brussels. Join us!)

Enrique Alfonseca
Enrique Alfonseca of Google Research Zurich

I invited Enrique to respond to questions about his work. First, a short bio:

Enrique Alfonseca manages the Natural Language Understanding (NLU) team at Google Research Zurich, working on information extraction and applications of text summarization. Overall, the Google Research NLU team "guides, builds, and innovates methodologies around semantic analysis and representation, syntactic parsing and realization, morphology and lexicon development. Our work directly impacts Conversational Search in Google Now, the Knowledge Graph, and Google Translate, as well as other Machine Intelligence research."

Before joining the NLU team, Enrique held different positions in the ads quality and search quality teams working on ads relevance and web search ranking. He launched changes in ads quality (sponsored search) targeting and query expansion leading to significant ads revenue increases. He is also an instructor at the Swiss Federal Institute of Technology (ETH) at Zurich. Here, then, is.

An Inside View of Language Technologies at Google

Seth Grimes> Your work has included a diverse set of NLP topics. To start, what's your current research agenda?

Enrique Alfonseca> At the moment my team is working on question answering in Google Search, which allows me and my colleagues to innovate in various different areas where we have experience. In my case, I have worked over the years on information extraction, event extraction, text summarization and information retrieval, and all of these come together for question answering -- information retrieval to rank and find relevant passages on the web, information extraction to identify concrete, factual answers for queries, and text summarization to present it to the user in a concise way.

Seth> And topics that colleagues at Google Research in Zurich are working on?

Enrique> The teams at Zurich work in a very connected way to the teams at other Google offices and the products that we are collaborating with, so it is hard to define a boundary between "Google Research in Zurich" and the rest of the company. This said, there are very exciting efforts in which people in Zurich are involved, in various areas of language processing (text analysis, generation, dialogue, etc.), video processing, handwriting recognition and many others.

Do you do only "pure" research or has your agenda, to some extent, been influenced by Google's product roadmap?

A 2012 paper from Alfred Spector, Peter Norvig and Slav Petrov nicely summarizes our philosophy to research. On the one hand, we believe that research needs to happen and actually happens in the product teams. A large proportion of our Software Engineers have a master or a Ph.D. degree and previous experience working on research topics, and they bring this expertise into product development to areas as varied as search quality, ads quality, spam detection, and many others. At the same time, we have a number of longer-term projects working on answers to the problems that Google, as a company, should have solved in a few years from now. In most of these, we take complex challenges and subdivide in smaller problems that one can handle and make progress quickly, with the aim of having impact in Google products along the way, in a way that moves us closer to the longer-term goals.

To give an example, when we started working on event models from text, we did not have a concrete product in mind yet, although we expected that understanding the meaning of what is reported in news should have concrete applications. After some time working on it, we realised that it was useful to make sure that the information from the Knowledge Graph that is shown in web search was always up-to-date according to the latest news. While we do not have yet models for high-precision, wide-coverage deep understanding of news, the technologies built along the way have already proven to be useful for our users.