This is the repository containing the code and data for the NeurIPS 2024 paper Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents [pdf]
We have released the poisoned training data used in Web Shopping (put in here) and Tool Learning (download from here) experiments.
The code for Query-Attack and Observation-Attack is in AgentTuning
.
The code for Thought-attack is mainly based on ToolBench. We provide an instruction in ToolBench/README.md
on how to use the poisoned data we provide.
If you use our code and data, please kindly cite our work as
@article{yang2024watch,
title={Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents},
author={Yang, Wenkai and Bi, Xiaohan and Lin, Yankai and Chen, Sishuo and Zhou, Jie and Sun, Xu},
journal={arXiv preprint arXiv:2402.11208},
year={2024}
}