-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
50ec931
commit 6861a6a
Showing
1 changed file
with
41 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,23 +1,56 @@ | ||
# WalledEval: Testing LLMs Against Jailbreaks and Unprecedented Harms | ||
# WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models | ||
|
||
[![PyPI Latest Release](https://img.shields.io/pypi/v/walledeval.svg)](https://pypi.org/project/walledeval/) | ||
[![paper](https://img.shields.io/badge/arxiv-2408.03837-b31b1b)](https://arxiv.org/abs/2408.03837) | ||
[![PyPI Latest Release](https://img.shields.io/pypi/v/walledeval.svg?logo=python&logoColor=white&color=blue)](https://pypi.org/project/walledeval/) | ||
[![PyPI Downloads](https://static.pepy.tech/badge/walledeval)](https://pepy.tech/project/walledeval) | ||
[![GitHub Page Views Count](https://badges.toozhao.com/badges/01J0NWXGZ7XGDPFYWHZ9EX1F46/blue.svg)](https://github.com/walledai/walledeval) | ||
[![GitHub Release Date](https://img.shields.io/github/release-date/walledai/walledeval?logo=github&label=latest%20release&color=blue)](https://github.com/walledai/walledeval/releases/latest) | ||
[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/walledai/walledeval/docs.yml?label=Docs%20CI&color=blue)](https://walledai.github.io/walledeval/) | ||
|
||
**WalledEval** is a simple library to test LLM safety by identifying if text generated by the LLM is indeed safe. We purposefully test benchmarks with negative information and toxic prompts to see if it is able to flag prompts of malice. | ||
|
||
!!! note "New Version Recently Released" | ||
## 🔥Announcements | ||
|
||
We have recently released `v0.2.0` of our codebase! This means that our documentation is not completely up-to-date with the current state of the codebase. However, we will be updating our documentation soon for all users to be able to quickstart using WalledEval! Till then, it is always best to consult the code or the `tutorials/` or `notebooks/` folders to have a better idea of how the codebase currently works. | ||
> Excited to announce the release of the community version of our guardrails: [WalledGuard](https://huggingface.co/walledai/walledguard-c)! **WalledGuard** comes in two versions: **Community** and **Advanced+**. We are releasing the community version under the Apache-2.0 License. To get access to the advanced version, please contact us at [[email protected]](mailto:[email protected]). | ||
## Announcements | ||
> Excited to partner with The IMDA Singapore AI Verify Foundation to build robust AI safety and controllability measures! | ||
> 🔥 Excited to announce the release of the community version of our guardrails: [WalledGuard](https://huggingface.co/walledai/walledguard-c)! **WalledGuard** comes in two versions: **Community** and **Advanced+**. We are releasing the community version under the Apache-2.0 License. To get access to the advanced version, please contact us at [[email protected]](mailto:[email protected]). | ||
> Grateful to [Tensorplex](https://www.tensorplex.ai/) for their support with computing resources! | ||
> 🔥 Excited to partner with The IMDA Singapore AI Verify Foundation to build robust AI safety and controllability measures! | ||
<!-- | ||
## 🔍 Quick Access | ||
<div class="grid cards"> | ||
<a href="https://paperswithcode.com/paper/walledeval-a-comprehensive-safety-evaluation" class="card"><div markdown> | ||
> 🔥 Grateful to [Tensorplex](https://www.tensorplex.ai/) for their support with computing resources! | ||
:simple-paperswithcode: Papers With Code | ||
</div></a> | ||
<a href="https://www.semanticscholar.org/paper/WalledEval%3A-A-Comprehensive-Safety-Evaluation-for-Gupta-Yau/5c7da78b978e2ef6cc791cfbf98dafbcb59f758b" class="card"> | ||
:simple-semanticscholar: Semantic Scholar | ||
</a> | ||
</div> | ||
--> | ||
|
||
## 📚 Resources | ||
|
||
- [**Technical Report**](https://arxiv.org/abs/2408.03837): Overview of Framework design and key flows to adopt | ||
- [**This Documentation**](https://walledai.github.io/walledeval/): More detailed compilation of project structure and data (WIP) | ||
- [**README**](https://github.com/walledai/walledeval): Higher level usage overview | ||
|
||
## 🖊️ Citing WalledEval | ||
|
||
```bibtex | ||
@misc{gupta2024walledeval, | ||
title={WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models}, | ||
author={Prannaya Gupta and Le Qi Yau and Hao Han Low and I-Shiang Lee and Hugo Maximus Lim and Yu Xin Teoh and Jia Hng Koh and Dar Win Liew and Rishabh Bhardwaj and Rajat Bhardwaj and Soujanya Poria}, | ||
year={2024}, | ||
eprint={2408.03837}, | ||
archivePrefix={arXiv}, | ||
primaryClass={cs.CL}, | ||
url={https://arxiv.org/abs/2408.03837}, | ||
} | ||
``` | ||
|
||
|
||
<!-- | ||
|