Exploring Tree of Thought Prompting: How AI Can Learn to Reason Through Search

New approach represents problem-solving as search over reasoning steps for large language models, allowing strategic exploration and planning beyond left-to-right decoding. This improves performance on challenges like math puzzles and creative writing, and enhances interpretability and applicability of LLMs.

Exploring Tree of Thought Prompting: How AI Can Learn to Reason Through Search
Image created by author with Midjourney


Key Points


  • A new paper proposes a "Tree of Thoughts" framework to allow more deliberate problem-solving
  • Represent the reasoning process as search over a tree of possible "thoughts"
  • Use the LLM itself to generate and evaluate these thoughts
  • Employ classic search algorithms to guide the exploration




Recently, large language models (LLMs) like GPT-3 have shown impressive abilities in areas like mathematical reasoning and commonsense knowledge. However, their basic text generation method — left-to-right, token-by-token — can limit strategic planning and exploration. The paper shows this approach significantly improves LLM problem-solving abilities on challenges like math puzzles and creative writing.




A recent paper, Tree of Thoughts: Deliberate Problem Solving with Large Language Models — by Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan — proposes a new framework called "Tree of Thoughts" (ToT) to enhance the problem-solving abilities of large language models (LLMs) like GPT-3 and GPT-4. Currently, LLMs are limited to left-to-right token-level decision making when generating text, which can fall short in tasks requiring more strategic planning and exploration.

ToT represents the problem-solving process as search over a tree, where each node is a "thought" — a coherent chunk of text representing an intermediate reasoning step. This allows the LLM to explore multiple reasoning paths and evaluate the progress of different thoughts towards solving the problem. Specifically, the framework involves:

  1. Decomposing the problem into coherent thought steps based on the task structure.
  2. Using the LLM to generate multiple thought candidates at each step, either independently or sequentially conditioned on previous thoughts.
  3. Getting the LLM to evaluate the promise of different states (partial solutions) through value estimation prompts that assess progress so far.
  4. Using classic search algorithms like breadth-first search or depth-first search over the tree, using the LLM's value estimates to guide exploration and pruning.

This deliberate search allows the LLM to look ahead, backtrack, and make more global choices when needed. The modular framework is model-agnostic and can flexibly adapt its components like thought size, generation, evaluation, and search to the problem structure.

The authors demonstrate ToT on three novel tasks — Game of 24, Creative Writing, and Mini Crosswords. In all cases, ToT significantly boosts the problem-solving performances of GPT-4 over standard prompting baselines. For example, in Game of 24 the success rate increased from 4% with chain-of-thought prompting to 74% with ToT.

Overall, ToT offers a way to integrate symbolic planning and search methods from classical AI with modern LLMs. The interpretability of its language-based thoughts and deliberation also provides opportunities for better human alignment. The authors propose it as an exciting new direction to develop more general problem-solving capabilities in LLMs.


Research Q&A

How does the Tree of Thoughts approach compare to other methods that incorporate symbolic planning or search with neural models, such as NeuroLogic decoding or the LLM+P framework?

The ToT framework differs in that it uses the LLM itself to provide heuristic guidance during search, rather than relying on a separate classical planner (LLM+P) or hard-coded heuristics (NeuroLogic). The language-based thought representation is also more flexible than symbolic planning languages. However, ToT does not yet achieve the level of tight integration and two-way communication between the LLM and planner components that LLM+P demonstrates.

Could the Tree of Thoughts approach be applied to natural language tasks like conversational dialogue or story generation, rather than just constrained reasoning tasks?

While the current paper focuses on reasoning tasks, the general framework of representing possible continuations as thoughts that can be deliberated over seems applicable to less constrained generation problems. For dialogue, thoughts could be candidate utterances to say next, while for stories they could be plot points or character actions. The key challenges would be defining coherent thought steps and developing effective evaluation prompts.

What is innovative about this research?

The key innovation is framing language model inference as search over a tree of thoughts rather than just left-to-right token generation. This allows more deliberate planning, exploration of alternatives, and global lookahead/backtracking. Representing thoughts as coherent semantic units is also innovative compared to previous search methods.

What are the broader implications of this research?

This research could significantly enhance the problem-solving and reasoning capabilities of LLMs, allowing their use in more complex real-world applications like coding, data analysis, robotics, etc. It also makes model decisions more interpretable. The integration of classical search methods with neural models is an exciting direction.

What are some potential issues or oversights with this research as presented, if any?

The tasks explored are still relatively simple. It remains to be seen if the approach scales to more open-ended problems. The search process likely incurs higher compute costs than standard sampling. The heuristics for pruning suboptimal branches are currently imperfect.

What are the logical next research steps from this research?

Important next steps are exploring ToT on more complex planning and decision making tasks, integrating it with external knowledge retrieval, and studying whether variants can be learned more sample-efficiently via meta-learning or reinforcement learning rather than relying solely on a pre-trained LLM. Analyzing the interplay between thought size, search budget, and performance is also an open question.




  • The Tree of Thoughts paradigm demonstrates how classical search techniques can be integrated with modern neural network models.
  • Allowing LLMs to explore alternate reasoning paths makes their decision-making more interpretable.
  • This research direction could enhance LLMs' applicability to complex real-world planning and analysis tasks.
  • Key next steps are extending the approach to less constrained problems, improving the search efficiency, and studying how such skills can be learned.
  • Overall, the deliberate and semantic reasoning of Tree of Thoughts offers an exciting new capability for artificial agents.

Matthew Mayo (@mattmayo13) is a Data Scientist and the Editor-in-Chief of KDnuggets, the seminal online Data Science and Machine Learning resource. His interests lie in natural language processing, algorithm design and optimization, unsupervised learning, neural networks, and automated approaches to machine learning. Matthew holds a Master's degree in computer science and a graduate diploma in data mining. He can be reached at editor1 at kdnuggets[dot]com.