CryptoFormalEval: Automated Benchmark for LLM Vulnerability Detection in Cryptographic Protocols

Introduction

Cryptographic protocols are essential to the security of modern digital infrastructures, but they are frequently deployed without undergoing formal verification, exposing systems to potential hidden vulnerabilities. Although formal verification methods are rigorous and effective, they are often complex and time-intensive, making them challenging to apply consistently in real-world scenarios. This project addresses this gap by introducing an automated benchmark designed to evaluate the capability of Large Language Models (LLMs) in detecting vulnerabilities in cryptographic protocols. We present a validated dataset of novel flawed communication protocols, accompanied by an automated validation method for results. In our research, we leverage state-of-the-art prompting and scaffolding techniques for LLMs, combined with Tamarin—a powerful symbolic reasoning tool—to enhance protocol analysis. This integration bridges natural language processing with formal verification, leading to a more efficient and comprehensive assessment of protocol security. Our early results demonstrate the potential of LLMs, augmented by symbolic reasoning, to contribute to defensive cybersecurity applications.

TL;DR

We introduce a benchmark for testing how well LLMs can find vulnerabilities in cryptographic protocols. By combining LLMs with symbolic reasoning tools like Tamarin, we aim to improve the efficiency and thoroughness of protocol analysis, paving the way for future AI-powered cybersecurity defenses.

Installation

Clone the repository:

git clone [email protected]:Cristian-Curaba/CryptoFormalEval.git

Create a virtual environment:
```
python -m venv LangChain
```
Activate the virtual environment:
- On macOS/Linux:
```
source LangChain/bin/activate
```
- On Windows:
```
.\LangChain\Scripts\activate
```
Install dependencies:
```
pip install -r requirements.txt
```
Install Tamarin: Follow instructions in Instructions_for_Tamarin_installation.md)
Compile ./src/formalizer Follow instructions in src/formalizer/README
Modify the .env.example file by adding API keys and the $PATH global variable (to execute Tamarin).

Usage

Run the main file:

python src/main.py

To explore the produced outputs, you can check the src/agent_execution folder. To further analysis, run the src/main_print_interactions.py file (change the id_run) and inspect src/history_run folder.

Contributing

Contributions are welcome! Please follow these steps to contribute:

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Make your changes and commit them (git commit -m "Add feature").
Push to the branch (git push origin feature-branch).
Create a pull request.

Inspecting Dataset and Results

In order to preserve the dataset from LLM memorization, we release the password decrypt the dataset.zip file upon request, and under the terms outlined in the LICENSE.

License

This project is licensed under the GNU GENERAL PUBLIC License - see the LICENSE file for details.

Acknowledgements

This research was made possible through the generous support of Apart Research, which provided both expert research guidelines and access to APIs. Their resources and guidance have been instrumental in shaping the direction and success of this work. We also extend our gratitude to our main research advisor, Natalia for their expertise, mentorship, and insightful feedback throughout the project.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
src		src
.gitignore		.gitignore
Instructions_for_Tamarin_installation.md		Instructions_for_Tamarin_installation.md
LICENSE		LICENSE
README.md		README.md
full_dataset_results.zip		full_dataset_results.zip
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CryptoFormalEval: Automated Benchmark for LLM Vulnerability Detection in Cryptographic Protocols

Introduction

TL;DR

Table of Contents

Installation

Usage

Contributing

Inspecting Dataset and Results

License

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

Cristian-Curaba/CryptoFormalEval

Folders and files

Latest commit

History

Repository files navigation

CryptoFormalEval: Automated Benchmark for LLM Vulnerability Detection in Cryptographic Protocols

Introduction

TL;DR

Table of Contents

Installation

Usage

Contributing

Inspecting Dataset and Results

License

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages