Topics covered
Welcome to the Awesome LLMs Planning Reasoning repository! This collection is dedicated to exploring the rapidly evolving field of Large Language Models (LLMs) and their capabilities in planning and reasoning.
As LLMs continue to demonstrate remarkable success in Natural Language Understanding (NLU) and Natural Language Generation (NLG), researchers are increasingly interested in assessing their abilities beyond traditional NLP tasks. One of the most promising and challenging areas of study is understanding how well LLMs can perform tasks that require planning and reasoning. These capabilities are essential for leveraging LLMs in more complex, real-world scenarios, such as autonomous decision-making, problem-solving, and strategic thinking. However, recent research suggests that LLMs often struggle with reasoning tasks that are relatively simple for most humans, highlighting the limitations of these models in this critical area.
This repository is a curated list of research papers, code repositories, and benchmarks that focus on the intersection of LLMs with planning and reasoning tasks. Here, you'll find:
- Techniques: Innovative methods that enable LLMs to reason and plan effectively, such as Chain-of-Thought prompting and Tree of Thoughts.
- Reasoning Limitations: Critical investigations that explore the limitations and challenges LLMs face in planning and reasoning tasks.
- Benchmarks: Standardized tests and evaluations designed to measure the performance of LLMs in these complex tasks.
- Miscellaneous Papers: Papers related to the field of LLMs and reasoning, but not directly focused on planning tasks.
- Additional Resources: Supplementary materials such as slides, dissertations, and other resources that provide further insights into LLM planning and reasoning.
Whether you're a researcher, developer, or enthusiast, this repository serves as a comprehensive resource for staying updated on the latest advancements and understanding the current challenges in the domain of LLMs' planning and reasoning abilities. Dive in and explore the fascinating world where language models meet high-level cognitive tasks!
Paper | Link | Code | Venue | Date | Other |
---|---|---|---|---|---|
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models | arXiv | -- | NeurIPS 22 | 28 Jan 2022 | Video |
Self-Consistency Improves Chain of Thought Reasoning in Language Models | arXiv | -- | ICLR 23 | 7 Mar 2023 | Video |
REACT: Synergizing Reasoning and Acting in Language Models | arXiv | GitHub | ICLR 23 | 10 Mar 2023 | Project Video |
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models | arXiv | GitHub | ICCV 23 | 30 Mar 2023 | Project |
Least-To-Most Prompting Enables Complex Reasoning In Large Language Models | arXiv | -- | ICLR 23 | 16 Apr 2023 | |
Chain-of-Symbol Prompting Elicits Planning in Large Language Models | arXiv | GitHub | ICLR 24 | 17 May 2023 | |
PlaSma: Procedural Knowledge Models for Language based Planning and Re-Planning | arXiv | GitHub | ICLR 24 | 26 Jul 2023 | |
Better Zero-Shot Reasoning with Role-Play Prompting | arXiv | GitHub | NAACL 24 | 15 Aug 2023 | |
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency | arXiv | GitHub | arXiv | 27 Sep 2023 | |
Reasoning with Language Model is Planning with World Model | arXiv | GitHub | EMNLP 23 | 23 Oct 2023 | |
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning | arXiv | GitHub | NeurIPS 23 | 30 Oct 2023 | Project |
PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization | arXiv | GitHub | ICLR 24 | 7 Dec 2023 | |
Tree of Thoughts: Deliberate Problem Solving with Large Language Models | arXiv | GitHub | NeurIPS 23 | 3 Dec 2023 | Video |
Learning adaptive planning representations with natural language guidance | arXiv | -- | arXiv | 13 Dec 2023 | |
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction | arXiv | GitHub | ICLR 24 | 21 Dec 2023 | |
Large Language Models can Learn Rules | arXiv | -- | arXiv | 24 Apr 2024 | |
What’s the Plan? Evaluating and Developing Planning-Aware Techniques for Language Models | arXiv | -- | arXiv | 22 May 2024 | |
Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models | arXiv | GitHub | arxiv | 6 Jun 2024 | |
Large Language Models Can Learn Temporal Reasoning | arXiv | GitHub | ACL 24 | 11 Jun 2024 | |
Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking | arXiv | GitHub | arXiv | 24 Jun 2024 | |
Tree Search for Language Model Agents | arXiv | GitHub | arXiv | 1 Jul 2024 | Project |
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models | arXiv | GitHub | ICLR 24 | 24 Jul 2024 | Project |
RELIEF: Reinforcement Learning Empowered Graph Feature Prompt Tuning | arXiv | -- | arXiv | 6 Aug 2024 | |
Automating Thought of Search: A Journey Towards Soundness and Completeness | arXiv | -- | arXiv | 21 Aug 2024 |
Paper | Link | Code | Venue | Date | Other |
---|---|---|---|---|---|
Understanding the Capabilities of Large Language Models for Automated Planning | arXiv | -- | arXiv | 25 May 2023 | |
Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond | arXiv | GitHub | arXiv | 8 Aug 2023 | |
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval | arXiv | GitHub | NeurIPS 23 | 2 Nov 2023 | |
On the Planning Abilities of Large Language Models : A Critical Investigation | arXiv | GitHub | NeurIPS 23 | 6 Nov 2023 | |
Large Language Models Cannot Self-Correct Reasoning Yet | arXiv | -- | ICLR 24 | 14 Mar 2024 | |
Dissociating language and thought in large language models | arXiv | -- | Trends in Cognitive Sciences | 23 Mar 2024 | |
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks | arXiv | GitHub | NAACL 24 | 28 Mar 2024 | |
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? | arXiv | -- | arXiv | 13 May 2024 | Video |
On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models | arXiv | -- | arXiv | 22 May 2024 | |
Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models | arXiv | GitHub | EACL 24 | 24 May 2023 | |
Can Graph Learning Improve Task Planning? | arXiv | GitHub | arXiv | 29 May 2024 | |
Graph-enhanced Large Language Models in Asynchronous Plan Reasoning | arXiv | GitHub | ICML 24 | 3 Jun 2024 | |
When is Tree Search Useful for LLM Planning? It Depends on the Discriminator | arXiv | GitHub | ACL 24 | 6 Jun 2024 | |
Chain of Thoughtlessness? An Analysis of CoT in Planning | arXiv | -- | arXiv | 6 Jun 2024 | |
Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to Boost for Reasoning | arXiv | GitHub | ACL 24 | 7 Jun 2024 | |
Can Language Models Serve as Text-Based World Simulators? | arXiv | GitHub | ACL 24 | 10 Jun 2024 | |
LLMs Can’t Plan, But Can Help Planning in LLM-Modulo Frameworks | arXiv | -- | ICML 24 | 12 Jun 2024 | Video |
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models | arXiv | GitHub | arXiv | 13 Jul 2024 | |
On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks | arXiv | -- | arXiv | 3 Aug 2024 | |
Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models | arXiv | -- | arXiv | 15 Aug 2024 |
Paper | Link | Code | Venue | Date | Other |
---|---|---|---|---|---|
Benchmarks for Automated Commonsense Reasoning: A Survey | arXiv | -- | arXiv | 22 Feb 2023 | |
BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology | arXiv | GitHub | EMNLP 24 | 16 Oct 2023 | |
AgentBench: Evaluating LLMs as Agents | arXiv | GitHub | ICLR 24 | 25 Oct 2023 | |
PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change | arXiv | GitHub | NeurIPS 23 Track on Datasets and Benchmarks | 23 Nov 2023 | |
Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena | arXiv | GitHub | arXiv | 3 Apr 2024 | Project |
WebArena: A Realistic Web Environment for Building Autonomous Agents | arXiv | GitHub | NeurIPS 23 Workshop | 16 Apr 2024 | Project |
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning | arXiv | -- | arXiv | 3 Jun 2024 | HuggingFace |
Open Grounded Planning: Challenges and Benchmark Construction | arXiv | GitHub | ACL 24 | 5 Jun 2024 | |
NATURAL PLAN: Benchmarking LLMs on Natural Language Planning | arXiv | -- | arXiv | 6 Jun 2024 | |
ResearchArena: Benchmarking LLMs’ Ability to Collect and Organize Information as Research Agents | arXiv | arXiv | 13 Jun 2024 | ||
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI | arXiv | GitHub | arXiv | 22 Jun 2024 | Project |
TravelPlanner: A Benchmark for Real-World Planning with Language Agents | arXiv | GitHub | ICML 24 | 23 Jun 2024 | Project |
GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models | arXiv | GitHub | arXiv | 3 Jul 2024 | |
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents | arXiv | GitHub | ACL 24 | 26 Jul 2024 | Project Video |
Paper | Link | Code | Venue | Date | Other |
---|---|---|---|---|---|
Lost in the Middle: How Language Models Use Long Contexts | arXiv | -- | TACL 23 | 20 Nov 2023 | |
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4 | arXiv | -- | arXiv | 8 Dec 2023 | Project |
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations | arXiv | -- | ACL 24 | 19 Feb 2024 | Project |
Better & Faster Large Language Models via Multi-token Prediction | arXiv | -- | arXiv | 30 Apr 2024 | Video HuggingFace |
Learning Iterative Reasoning through Energy Diffusion | arXiv | GitHub | ICML 24 | 17 Jun 2024 | Project |
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data | arXiv | GitHub | arXiv | 20 Jun 2024 | |
What's the Magic Word? A Control Theory of LLM Prompting | arXiv | -- | arXiv | 3 Jul 2024 | |
AGENTGEN: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation | arXiv | -- | arXiv | 1 Aug 2024 | |
Generative Verifiers: Reward Modeling as Next-Token Prediction | arXiv | -- | arXiv | 27 Aug 2024 |
Resource | Link |
---|---|
Yochan Tutorials on Large Language Models and Planning | link |
On The Capabilities and Risks of Large Language Models | link |
Large Language Models for Reasoning | link |
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models | link |
Physics of Language Models | link |
If you want to say thank you or/and support active development of Awesome LLMs for Planning and Reasoning, add a GitHub Star to the project.
Together, we can make Awesome LLMs for Planning and Reasoning better!
First off, thanks for taking the time to contribute! Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make will benefit everybody else and are greatly appreciated.
The original setup of this repository is by Sambhav Khurana.
For a full list of all authors and contributors, see the contributors page.