Skip to content

muellerlab/synthesizing_interpretable_control_policies

 
 

Repository files navigation

Synthesizing interpretable control policies

This repo contains the code associated to the paper "Synthesizing Interpretable Control Policies through Large Language Model Guided Search".

Reference

If you use this code in an academic context, please cite the publication:

@article{bosio2024synthesizing,
  title={Synthesizing Interpretable Control Policies through Large Language Model Guided Search},
  author={Bosio, Carlo and Mueller, Mark W},
  journal={arXiv preprint arXiv:2410.05406},
  year={2024}
}

Usage

In our implementation, the LLM we use is Starcoder2 through the Ollama API. We run the model locally on a RTX3090 GPU. As an alternative, the OpenAI APIs can also be used. We define some model hyperparameters in the Modelfile. Once Ollama is installed, to instantiate the model, run:

ollama create starcoder2:control -f Modelfile

We run the algorithm on Docker. The implementation is taken from this repo, which is itself a fork from the DeepMind FunSearch repo.

You can run FunSearch in container using Docker. There are variations to how to make the LLM interface with the container. These are the commands that we used:

docker build . -t funsearch

# Create a folder to share with the container
mkdir data

docker run --network host -it -v /home/cbosio/fun-design/data:/workspace/data funsearch

# [carlo] to run dm_control (the number is #episodes)
funsearch run examples/dm_control_swingup_spec.py 1 --sandbox_type ExternalProcessSandbox
funsearch run examples/dm_control_ballcup_spec.py 1 --sandbox_type ExternalProcessSandbox

Make sure to select the right regex in evaluator.py. You should see output something like

INFO:root:Writing logs to data/1704956206
INFO:absl:Best score of island 0 increased to 2048
INFO:absl:Best score of island 1 increased to 2048
INFO:absl:Best score of island 2 increased to 2048
INFO:absl:Best score of island 3 increased to 2048
INFO:absl:Best score of island 4 increased to 2048
INFO:absl:Best score of island 5 increased to 2048
INFO:absl:Best score of island 6 increased to 2048
INFO:absl:Best score of island 7 increased to 2048
INFO:absl:Best score of island 8 increased to 2048
INFO:absl:Best score of island 9 increased to 2048
INFO:absl:Best score of island 5 increased to 2053
INFO:absl:Best score of island 1 increased to 2049
INFO:absl:Best score of island 8 increased to 2684
^C^CINFO:root:Keyboard interrupt. Stopping.
INFO:absl:Saving backup to data/backups/program_db_priority_1704956206_0.pickle.

Tests

If you are interested in seeing the performances of the policies presented in the paper, just run the scripts in the /dm_control_tests folder.

For more implementation details, check this repo. The original research work can be found at

Romera-Paredes, B. et al. Mathematical discoveries from program search with large language models. Nature (2023)

Contact

Please contact [email protected] if you have questions.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.3%
  • Dockerfile 1.7%