Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot

The new conversational agent exhibit human-like behavior in conversations about almost any topic.


Natural language understanding(NLU) has been one of the most active areas adopting state-pf-the-art deep learning technologies. Today, we have dozens of mainstream NLU stacks that enable the implementation of decently sophisticated conversational agents with minimum efforts. However, the vast majority of conversational models remain highly constrained to a single subject. The industry refers to these agents as closed-domain chatbots. The opposite of closed-domain chatbots would be conversational agents that can engage in conversations across a multitude of topics simulating a human conversational style. We called this type of agents, open-domain chatbots and they are incredibly difficult to implement. Recently, the Facebook artificial intelligence research(FAIR) team unveiled the research and open source code for Blender, the largest-ever open domain chatbot.

The quest for building open-domain conversational agents that can mimic human-style dialogs is a key focus on NLU research for several reasons. Language is a fundamental element in the development of human intelligence since our infant days. Throughout that process, we acquire a series of skills such as the ability to listen, empathy or aligning different responses with a consistent point of view or values which are essential pieces of human communications. While we still don’t understand the neuroscientific architecture of those capabilities, we can agree that its recreation in NLU agents is required to achieved human-level communications. Not surprisingly, many of the companies pursuing research in open-domain chatbots are technology giants heavily invested in conversational interfaces. A few months ago, Google unveiled the research behind Meena, a conversational agent that could engage in dialogs across different topics. Despite those efforts, the implementation of open-domain chatbots remains incredibly challenging. In particular, there are three key challenges that remain incredibly crucial for the implementation of open-domain chatbots with the current generation of NLU technologies.

  1. Large Scale Pre-Training: Building open-domain chatbots today requires of massively large pretrained models. This approach has been proven by recent language agents such as Google’s BERT or Microsoft’s Turing-NLG.
  2. Blending Skills: Abilities such as empathy, unique personality or contextualized knowledge are essential for good conversations.
  3. Human Subjectivity: There is no effective way to quantify a human-like conversation. For that, we still rely on human judgement. Research has shown that subjective aspects such as the length of an answer can result in impact in the human judgements of quality.



Blender is an open source open-domain chatbot released as part of the ParlAI project. Blender is able to engage in a large variety of conversations across nearly any topic while displaying human-like characteristics such as empathy and personable levels of engagement. In order to achieve that, the Facebook team had to directly address some of the challenges outlined in the previous section.


Pretraining Scale

Blender is based on a transformer architecture similar to projects like BERT or Turing-NLG. The current version of Blender uses a pretrained neural network of 9.4 billion parameters. That big of a neural network can run on a single device. As a result, Blender uses a column-wise parallelism technique to split Blender into smaller neural networks that can execute in parallel while maintaining high levels of efficiency.


Blending Skills

In order to evaluate Blender for different human-like conversational skills, the Facebook team relies on a parallel research effort known as the Blended Skill Talk (BST). BST is a new dataset and benchmark to evaluate abilities such as knowledge and empathy in conversational agents. Specifically, BST combines the following datasets to evaluate the different blending skills:

The use of BST allowed Blender to learn different behaviors such as changing tones to appear empathetic to the other party or properly reacting to jokes.


Generation Strategies

As previously explained, aspects such as the length of an answer can have a strong impact on the quality of the conversation. To control that, Blender relies on a fine-tuned model for hyperparameter search which helps balances the tradeoffs between knowledge display and length.


Blender Architecture

Blender is a combination of three Transformer architectures that optimize different aspects of an open domain chatbot.

  1. Retriever: The Retriever transformer receives a dialog history as input and selects the next utterance. This is typically done by selecting the highest score across all possible responses in the training set.
  2. Generator: The Generator Transformer is a Seq2Seq model that generates different responses instead of selecting them from the training dataset. Blender leverage Generator models included in the current version of ParlAI.
  3. Retrieve and Refine: This Transformer model attempts to refine the response produce by traditional generative models. It is common knowledge that generative models often hallucinate responses. The Retrieve and Refine architecture uses tries to address these problems by introducing the retrieval step before the generation step and try to refine it as much as possible. Blender uses two retrieval techniques known as dialogue retrieval and knowledge retrieval.


Blender in Action

The current version of Blender includes different architectures trained on 90M, 2.7B and 9.4B parameters respectively. Not surprisingly, the initial tests showed that the larger models can achieve higher performance in fewer steps.


Facebook evaluated Blender using different benchmarks. Most notably, Blender was compared against the Google Meena chatbot using pairwise human evaluations. Blender outperformed Meena in terms of engagement(a) and humanness(b) style of conversations.


Additionally, Blender was also evaluated against human responses and the results were comparable. In fact, up to 49% of evaluators preferred Blender’s responses to humans.


The conversations produced by Blender are incredibly impressive. The examples below that give us the glimpse of the level of engagement, vast knowledge and vocabulary used by the conversational agent.


Blender represents an important milestone in the implementation of open-domain conversational agents. Even though Blender can still be repetitive and make mistakes, its performance showed that we might be just one or two breakthroughs away from achieving human-like conversational capabilities in AI agents.

Original. Reposted with permission.