Salesforce Open Sources a Framework for Open Domain Question Answering Using Wikipedia

The framework uses a multi-hop QA method to answer complex questions by reasoning through Wikipedia’s datasets.

Question-answering is a natural human cognitive mechanism that plays a key ole in the acquisition of knowledge. We are constantly evaluating information to develop answers to specific questions. For years, artificial intelligence(AI) systems have tried to simulate that cognitive ability in the form of a discipline known as Open Domain Question Answering(ODQA). Recently, Salesforce Research open sourced a framework for ODQA based on the Wikipedia graph.

ODQA has been one of the most active fields in natural language processing(NLP) research in recent years. However most ODQA systems operate within a highly constrained environments in which open-domain QA systems first select a few paragraphs for each query, using a computationally efficient term-based retriever and then read the top-ranked paragraphs to extract an answer. This approach is known as single-hope QA and even though is a difficult challenge it does not quite resembles the human cognition process. Regularly, we are faced with questions that require examining many documents to construct a single answer. This has become one of the new frontiers for ODQA systems.


Multi-Hop QA.

Consider trying to answer the question When was the football club founded in which Walter Otto Davis played at center-forward? using two paragraphs listed below. There are a few challenges that should be considered in the context of ODQA systems. For the example question, it is necessary to effectively “hop” to the paragraph 2 (Millwall F.C.), which contains the answer (1985). However, widely-used term-based retrieval may fail to find it, as the key entity, “Millwall Football Club,” is not explicitly stated in the question. To answer the target question, an ODQA systems needs to do “multiple hops” across different documents. AI theory typically refers to this type of system as multi-hop QA.

Multi-hop QA usually requires finding more than one evidence document, one of which often consists of little lexical overlap or semantic relationship to the original question. The key element to solve multi-hop QA problems is to build a cognitive graph of documents that contain the evidence required to answer the target question.


Wikipedia Reasoning Graph

Building a cognitive graph of documents seems like something we’ve already done. Wikipedia provides a rich, hierarchical dataset of linked documents that provide evidence to almost any question we can think of. Following that line of thinking, Salesforce’s framework leveraged Wikipedia to learn to retrieve reasoning paths and build a cognitive graph to answer complex open-domain questions.

In order to construct the reasoning graph, the Salesforce framework first search answer for a complex question on Wikipedia, we may first look at a Wikipedia article we can easily find based on partial information in the question. If we cannot find enough information there, we might click a hyperlink to another highly related article, and terminate searching when we collect enough evidence to answer the question. The reasoning graph is constructed using links in Wikipedia articles. Specifically, the framework uses the hyperlinks to construct the direct edges of the graph. Additionally, it also considers symmetric within-document links, allowing a paragraph to hop to other paragraphs in the same article. The resulting Wikipedia graph is densely connected and covers a wide range of topics that provide useful evidence for open-domain questions. This graph is constructed offline and is reused throughout training and inference for any question.

Using a reasoning graph as an starting point, the Salesforce framework relies on recurrent neural networks(RNN) to model the reasoning path for a given question. At any given time, the model selects a paragraph among a set of candidate paragraphs given the current hidden state of the RNN. The initial hidden state is independent of any questions or paragraphs, and based on a parameterized vector. The model relies on BERT’s token representation to independently encode each candidate paragraph. After that, the RNN computes the then compute the probability that the candidate paragraph is selected. The RNN selection procedure captures relationships between paragraphs in the reasoning path by conditioning on the selection history. The process is terminated when [EOE], the end-of evidence symbol, is selected, to allow it to capture reasoning paths with arbitrary length given each question.

The Salesforce framework actively leverages efficient search strategies like beam search that can reduce the computational complexity of the model which has traditionally been a challenge for previous multi-hop QA approaches.

Salesforce evaluated its graph retriever framework for several Wikipedia datasets: HotpotQA, SQuAD Open and Natural Questions Open. The model achieved remarkable performance across all datasets but exceled when using the HotpotQA dataset in which multi-hop retrieval was essential. A pretty impressive fact is that the complete model was trained using a single GPU with 11GB of RAM. The following matrix shows the results.

Arguably the most notable achievement of the Salesforce framework was the ability of build reasoning paths in scenarios that were impossible for other multi-hop models. For instance, the following figure shows a scenario in which the Salesforce graph model successfully retrieves the correct reasoning path and answers correctly, while other models like Re-rank fails. The top two paragraphs next to the graph are the introductory paragraphs of the two entities on the reasoning path, and the paragraph at the bottom shows the wrong paragraph selected by Re-rank. The “Millwall F.C.” has fewer lexical overlaps and the bridge entity “Millwall” is not stated in the given question. Thus, Re-rank chooses a wrong paragraph with high lexical overlaps to the given question.

The salesforce graph-based retrieval framework represents an interesting approach to ODQA systems. The model learned to sequentially retrieve evidence paragraphs to form the reasoning path and then re-ranks the reasoning paths, determining the final answer as the one extracted from the best reasoning path. The current implementation of the framework is available in Github and is accompanied by a research paper detailing the underlying techniques.

Original. Reposted with permission.