BadAgents: Backdoor Attacks on LLM-based Agents

This is the repository containing the code and data for the NeurIPS 2024 paper Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents [pdf]

Poisoned Data

We have released the poisoned training data used in Web Shopping (put in here) and Tool Learning (download from here) experiments.

Query-Attack and Observation-Attack

The code for Query-Attack and Observation-Attack is in AgentTuning.

Thought-Attack

The code for Thought-attack is mainly based on ToolBench. We provide an instruction in ToolBench/README.md on how to use the poisoned data we provide.

Citation

If you use our code and data, please kindly cite our work as

@article{yang2024watch,
  title={Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents},
  author={Yang, Wenkai and Bi, Xiaohan and Lin, Yankai and Chen, Sishuo and Zhou, Jie and Sun, Xu},
  journal={arXiv preprint arXiv:2402.11208},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BadAgents: Backdoor Attacks on LLM-based Agents

Poisoned Data

Query-Attack and Observation-Attack

Thought-Attack

Citation

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
AgentTuning		AgentTuning
ToolBench		ToolBench
assets		assets
data		data
README.md		README.md

lancopku/agent-backdoor-attacks

Folders and files

Latest commit

History

Repository files navigation

BadAgents: Backdoor Attacks on LLM-based Agents

Poisoned Data

Query-Attack and Observation-Attack

Thought-Attack

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages