Official code for "Flow of Reasoning:Training LLMs for Divergent Problem Solving with Minimal Examples" Also check our [Project Page]
Our FoR formulates multi-step reasoning tasks as flow:
- Design reward
$R(s_n)$ of terminal states for different tasks. - Collect trajectories with the local search technique.
- Training LLM policy
$P_{F}$ with trajectory balance loss.
1) Download this GitHub
git clone https://github.com/Yu-Fangxu/FoR.git
2) Prepare the environment
We recommend conda for setting up a reproducible experiment environment. We include environment.yaml
for creating a working environment:
bash install.sh
3) Choose 1 of 5 tasks to run
cd BlocksWorld|Game24|prontoqa|1D-ARC|Rubik's_Cube
Check more detailed instructions in each branch.
@article{yu2024flow,
title={Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking},
author={Yu, Fangxu and Jiang, Lai and Kang, Haoqiang and Hao, Shibo and Qin, Lianhui},
journal={arXiv preprint arXiv:2406.05673},
year={2024}
}