diff --git a/docs/index.md b/docs/index.md index 73d37d7..e9e85b1 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,23 +1,56 @@ -# WalledEval: Testing LLMs Against Jailbreaks and Unprecedented Harms +# WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models -[![PyPI Latest Release](https://img.shields.io/pypi/v/walledeval.svg)](https://pypi.org/project/walledeval/) +[![paper](https://img.shields.io/badge/arxiv-2408.03837-b31b1b)](https://arxiv.org/abs/2408.03837) +[![PyPI Latest Release](https://img.shields.io/pypi/v/walledeval.svg?logo=python&logoColor=white&color=blue)](https://pypi.org/project/walledeval/) [![PyPI Downloads](https://static.pepy.tech/badge/walledeval)](https://pepy.tech/project/walledeval) [![GitHub Page Views Count](https://badges.toozhao.com/badges/01J0NWXGZ7XGDPFYWHZ9EX1F46/blue.svg)](https://github.com/walledai/walledeval) +[![GitHub Release Date](https://img.shields.io/github/release-date/walledai/walledeval?logo=github&label=latest%20release&color=blue)](https://github.com/walledai/walledeval/releases/latest) +[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/walledai/walledeval/docs.yml?label=Docs%20CI&color=blue)](https://walledai.github.io/walledeval/) **WalledEval** is a simple library to test LLM safety by identifying if text generated by the LLM is indeed safe. We purposefully test benchmarks with negative information and toxic prompts to see if it is able to flag prompts of malice. -!!! note "New Version Recently Released" +## 🔥Announcements - We have recently released `v0.2.0` of our codebase! This means that our documentation is not completely up-to-date with the current state of the codebase. However, we will be updating our documentation soon for all users to be able to quickstart using WalledEval! Till then, it is always best to consult the code or the `tutorials/` or `notebooks/` folders to have a better idea of how the codebase currently works. +> Excited to announce the release of the community version of our guardrails: [WalledGuard](https://huggingface.co/walledai/walledguard-c)! **WalledGuard** comes in two versions: **Community** and **Advanced+**. We are releasing the community version under the Apache-2.0 License. To get access to the advanced version, please contact us at [admin@walled.ai](mailto:admin@walled.ai). -## Announcements +> Excited to partner with The IMDA Singapore AI Verify Foundation to build robust AI safety and controllability measures! -> 🔥 Excited to announce the release of the community version of our guardrails: [WalledGuard](https://huggingface.co/walledai/walledguard-c)! **WalledGuard** comes in two versions: **Community** and **Advanced+**. We are releasing the community version under the Apache-2.0 License. To get access to the advanced version, please contact us at [admin@walled.ai](mailto:admin@walled.ai). +> Grateful to [Tensorplex](https://www.tensorplex.ai/) for their support with computing resources! -> 🔥 Excited to partner with The IMDA Singapore AI Verify Foundation to build robust AI safety and controllability measures! + + +## 📚 Resources + +- [**Technical Report**](https://arxiv.org/abs/2408.03837): Overview of Framework design and key flows to adopt +- [**This Documentation**](https://walledai.github.io/walledeval/): More detailed compilation of project structure and data (WIP) +- [**README**](https://github.com/walledai/walledeval): Higher level usage overview + +## 🖊️ Citing WalledEval + +```bibtex +@misc{gupta2024walledeval, +      title={WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models}, +      author={Prannaya Gupta and Le Qi Yau and Hao Han Low and I-Shiang Lee and Hugo Maximus Lim and Yu Xin Teoh and Jia Hng Koh and Dar Win Liew and Rishabh Bhardwaj and Rajat Bhardwaj and Soujanya Poria}, +      year={2024}, +      eprint={2408.03837}, +      archivePrefix={arXiv}, +      primaryClass={cs.CL}, +      url={https://arxiv.org/abs/2408.03837}, +} +```