Synthesizing interpretable control policies

This repo contains the code associated to the paper "Synthesizing Interpretable Control Policies through Large Language Model Guided Search".

Reference

If you use this code in an academic context, please cite the publication:

@article{bosio2024synthesizing,
  title={Synthesizing Interpretable Control Policies through Large Language Model Guided Search},
  author={Bosio, Carlo and Mueller, Mark W},
  journal={arXiv preprint arXiv:2410.05406},
  year={2024}
}

Usage

In our implementation, the LLM we use is Starcoder2 through the Ollama API. We run the model locally on a RTX3090 GPU. As an alternative, the OpenAI APIs can also be used. We define some model hyperparameters in the Modelfile. Once Ollama is installed, to instantiate the model, run:

ollama create starcoder2:control -f Modelfile

We run the algorithm on Docker. The implementation is taken from this repo, which is itself a fork from the DeepMind FunSearch repo.

You can run FunSearch in container using Docker. There are variations to how to make the LLM interface with the container. These are the commands that we used:

docker build . -t funsearch

# Create a folder to share with the container
mkdir data

docker run --network host -it -v /home/cbosio/fun-design/data:/workspace/data funsearch

# [carlo] to run dm_control (the number is #episodes)
funsearch run examples/dm_control_swingup_spec.py 1 --sandbox_type ExternalProcessSandbox
funsearch run examples/dm_control_ballcup_spec.py 1 --sandbox_type ExternalProcessSandbox

Make sure to select the right regex in evaluator.py. You should see output something like

INFO:root:Writing logs to data/1704956206
INFO:absl:Best score of island 0 increased to 2048
INFO:absl:Best score of island 1 increased to 2048
INFO:absl:Best score of island 2 increased to 2048
INFO:absl:Best score of island 3 increased to 2048
INFO:absl:Best score of island 4 increased to 2048
INFO:absl:Best score of island 5 increased to 2048
INFO:absl:Best score of island 6 increased to 2048
INFO:absl:Best score of island 7 increased to 2048
INFO:absl:Best score of island 8 increased to 2048
INFO:absl:Best score of island 9 increased to 2048
INFO:absl:Best score of island 5 increased to 2053
INFO:absl:Best score of island 1 increased to 2049
INFO:absl:Best score of island 8 increased to 2684
^C^CINFO:root:Keyboard interrupt. Stopping.
INFO:absl:Saving backup to data/backups/program_db_priority_1704956206_0.pickle.

Tests

If you are interested in seeing the performances of the policies presented in the paper, just run the scripts in the /dm_control_tests folder.

For more implementation details, check this repo. The original research work can be found at

Romera-Paredes, B. et al. Mathematical discoveries from program search with large language models. Nature (2023)

Contact

Please contact [email protected] if you have questions.

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
dm_control_tests		dm_control_tests
examples		examples
funsearch		funsearch
.dockerignore		.dockerignore
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Modelfile		Modelfile
README.md		README.md
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthesizing interpretable control policies

Reference

Usage

Tests

Contact

About

Releases

Packages

Languages

License

muellerlab/synthesizing_interpretable_control_policies

Folders and files

Latest commit

History

Repository files navigation

Synthesizing interpretable control policies

Reference

Usage

Tests

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages