This repository is a fork of Dyna Gym and extends its functionality to focus on using Monte-Carlo tree search for decoding large language models (LLMs).
First, create a new Conda environment (optional):
conda create --name mcts-for-llm python=3.10
conda activate mcts-for-llm
We tested on python 3.10. Other versions may work as well.
Then, git clone this repo and install the package:
pip install -e .
Run the following command to generate texts using the GPT-2 model, guided by UCT (Upper Confidence Bound applied to Trees) for language alignment. Positive sentiment is used as the reward.
python examples/uct_language_alignment.py
This repository also includes some classic planning domains derived from the original Dyna Gym repo. These examples don't use LLMs but may be useful for debugging purposes.
python examples/uct_nscartpole_v0.py
python examples/uct_nscartpole_v1.py
...