Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games

The new framework provides an OpenAI-like environment for language-based games.

Language is one of the hallmarks of human intelligence and one that plays a key role in our learning processes. By using language, we constantly formulate our understanding of a situation of a specific context. Many of the magical abilities of the human brain such as commonsense reasoning, deduction or inference are regularly expressed via language. It is hard to imagine any form of advanced artificial intelligence(AI) that wouldn’t rely on language to express its interactions with a given environment. In recent years, reinforcement learning have shown some promise helping AI agents learn their own languages for communication. To facilitate rapid development of language-based agents, Microsoft Research has open sourced Jericho, an learning environment that leverages language games to train reinforcement learning agents.

The idea of using language games to develop knowledge makes intuitive sense. Just like babies learn to develop language by regularly interactive with objects, we can build AI environment with the right incentives to develop communications in reinforcement learning agents. From the different game theory models in the language learning space, Jericho relies on a recent discipline known as interactive fiction(IF) games.


IF Games

In computer science, IF games are defined as are software environments in which players observe textual descriptions of the simulated world, issue text actions, and receive score as they progress through the story. From that perspective, IF games are fully text-based simulation environments where a player issues text commands to effect change in the environment and progress through the story. IF games like Zork I have achieved incredible levels of popularity. To illustrate how IF games can be applied to train AI agents, consider the game illustrated in the following figure: a reinforcement learning agent could learn to interact with an environment issuing language commands and receiving textual description of the new state.

Like many natural language processing(NLP) tasks, IF games require natural language understanding, but unlike most NLP tasks, IF games are sequential decision making problems in which actions change the subsequent world states of the game and choices made early in a game may have long term effects on the eventual endings. Furthermore, IF games come with their own set of natural language learning challenges:

  • Combinatorial Action Space: More reinforcement learning models have been designed to operate on either discrete or continuous spaces. However, IF games require the agent to operate in the combinatorial action space of natural language. For example, an agent generating a four-word sentence from a modest vocabulary of size 700, is effectively exploring a space of 7004 = 240 billion possible actions.
  • Commonsense Reasoning: Due to the lack of graphics, IF games rely on the player’s commonsense knowledge as a prior for how to interact with the game world. For example, a human player encountering a locked chest intuitively understands that the chest needs to be unlocked with some type of key, and once unlocked, the chest can be opened and will probably contain useful items.
  • Knowledge Representation: IF games span many distinct locations, each with unique descriptions, objects, and characters. Players move between locations by issuing navigational commands like go west. because connectivity between locations is not necessarily Euclidean, agents need to detect when a navigational action has succeeded or failed and whether the location reached was previously seen or new. Beyond location connectivity, it’s also helpful to keep track of the objects present at each location, with the understanding that objects can be nested inside of other objects, such as food in a refrigerator or a sword in a chest.

These challenges need to be addressed in order to make IF games a viable mechanism for training reinforcement learning agents.


Enter Microsoft Jericho

Jericho is a Python-based learning environment based on IF games. You can think about Jericho as the OpenAI Gym of language learning. Jericho is optimized for reinforcement learning models and provides capabilities such as game state persistence that could help enable capabilities such as memory in reinforcement learning agents. In order to make IF games more accessible and address some of the challenges mentioned in the previous section, Jericho includes the following features:

  • World-Object-Tree Representation: Because of the large number of locations, objects, and characters in many games and the possibility of puzzles requiring objects not present in the current location, agents need to develop ways to remember and reason about previous interactions. World-object-tree representations of the game state enumerate these elements.
  • Fixed Random Seed to Enforce Determinism: By making games deterministic, where subsequent states are a direct result of a specific action taken by an agent, Jericho enables the use of targeted exploration algorithms like Go-Explore, which systematically build and expand a library of the visited states.
  • Load/save Functionality: This feature enables restoration of previous game states, enabling the use of planning algorithms like Monte-Carlo tree search.
  • World-Change Detection and Valid-Action Identification: This feature provides feedback on the success or failure of an agent’s last action to effect a change in the game state. Furthermore, Jericho can perform a search to identify valid actions, those that lead to changes in the game state.

The current version of Jericho includes two learning agents, Template-DQN (TDQN) and deep reinforcement relevance network (DRRN). TDQN models are typically more effective in parser-based games that handles the combinatorial action space by generating verb-object actions from a pre-defined set verbs and objects. DRRN models are better applied on choice-based games that present a series of choices at every state of the game. Jericho provides a common encoder to have a consistent input representation for both models. While both agents utilize common input representation, they differ in the methods of action selection. DRRN uses Jericho’s valid-action identification to estimate a Q-value for each of the valid actions a. It then either acts greedily by selecting the action with the highest Q-value or explores by sampling from the distribution of valid actions.

In the previous diagram, we can see factors such as Onar, Oinv and Odesc as the elements to encode the textual input. After each command, Jericho agents use a common input representation that includes current textual observation Onar, inventory text Oinv, and current location description Odesc (as given by a look command). If we use the command “open window” in the popular text-based game Zork I, Jericho will generate the following representation.

  • Onar: With great effort, you open the window far enough to allow entry.
  • Oinv: You are empty-handed.
  • Odesc: You are behind the white house. A path leads into the forest to the east. In one corner of the house there is a small window which is slightly ajar.

Microsoft evaluated the TDQN and DRRN across a diverse set of 32 games, including Zork I. The results showed that the Jericho agents accumulated higher scores than the other agents, even when dealing with action spaces as large as 98 million. The success of these learning agents demonstrates Jericho is effective at reducing the difficulty of IF games and making them more accessible for RL agents to learn and improve language-based skills. However, none the agents even came close to human-level performance.

Jericho is an important step towards making language a central part of reinforcement learning models. Some of the initial experiments showed that humans cognitive skills like common sense and deduction remain a high barrier for AI agents. However, Jericho showed that language-based training could be the key to unlock those capabilities.

Original. Reposted with permission.