Zhuoyan Xu*, Zhenmei Shi*, Yingyu Liang
This repository is contains the benchmark and associate code in paper Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability.
It is tested under Ubuntu Linux 20.04 and Python 3.11 environment and requires some packages to be installed.
- Pytorch >= 1.12.1 (guide is here)
- Transformer >= 4.37
- To evaluate the model on logical tasks, run:
bash run_logic.sh