Monte-Carlo Tree Search for Large Language Models

This repository is a fork of Dyna Gym and extends its functionality to focus on using Monte-Carlo tree search for decoding large language models (LLMs).

Installation

First, create a new Conda environment (optional):

conda create --name mcts-for-llm python=3.10
conda activate mcts-for-llm

We tested on python 3.10. Other versions may work as well.

Then, git clone this repo and install the package:

pip install -e .

Examples

Using GPT-2 and UCT for Language Alignment with Positive Sentiment Reward

Run the following command to generate texts using the GPT-2 model, guided by UCT (Upper Confidence Bound applied to Trees) for language alignment. Positive sentiment is used as the reward.

python examples/uct_language_alignment.py

Classic Planning Domains (Non-LLM)

This repository also includes some classic planning domains derived from the original Dyna Gym repo. These examples don't use LLMs but may be useful for debugging purposes.

python examples/uct_nscartpole_v0.py
python examples/uct_nscartpole_v1.py
...

Name		Name	Last commit message	Last commit date
Latest commit History 247 Commits
dyna_gym		dyna_gym
examples		examples
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Monte-Carlo Tree Search for Large Language Models

Installation

Examples

Using GPT-2 and UCT for Language Alignment with Positive Sentiment Reward

Classic Planning Domains (Non-LLM)

About

Releases

Packages

Languages

shunzh/mcts-for-llm

Folders and files

Latest commit

History

Repository files navigation

Monte-Carlo Tree Search for Large Language Models

Installation

Examples

Using GPT-2 and UCT for Language Alignment with Positive Sentiment Reward

Classic Planning Domains (Non-LLM)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages