Skip to content

Commit

Permalink
Merge branch 'main' into fix-373
Browse files Browse the repository at this point in the history
  • Loading branch information
copernico authored Jul 30, 2024
2 parents 29d739a + 312e910 commit d78ca86
Show file tree
Hide file tree
Showing 81 changed files with 5,857 additions and 1,305 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ prospector/install_fastext.sh
prospector/nvd.ipynb
prospector/data/nvd.pkl
prospector/data/nvd.csv
prospector/data_sources/reports
.vscode/settings.json
prospector/cov_html/*
prospector/client/cli/cov_html/*
Expand All @@ -51,6 +52,7 @@ prospector/.coverage
**/cov_html
prospector/cov_html
.coverage
prospector/.venv
prospector/prospector.code-workspace
prospector/requests-cache.sqlite
prospector/prospector-report.html
Expand Down
6 changes: 3 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
fail_fast: true
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.2.0
rev: v4.3.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
Expand Down Expand Up @@ -30,11 +30,11 @@ repos:
# - id: go-unit-tests
# - id: go-build
- repo: https://github.com/psf/black
rev: 19.10b0
rev: 22.10.0
hooks:
- id: black
- repo: https://github.com/pycqa/isort
rev: 5.6.4
rev: 5.12.0
hooks:
- id: isort
args: ["--profile", "black", "--filter-files"]
Expand Down
2 changes: 1 addition & 1 deletion .reuse/dep5
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,6 @@ Files: prospector/* kaybee/* docs/* scripts/* vulnerability-data/* MSR2019/* NO
Copyright: 2019-2020 SAP SE or an SAP affiliate company and project "KB" contributors
License: Apache-2.0

Files: mkdocs.yml .chglog/* .github/* CHANGELOG.md CONTRIBUTING.md Makefile go.mod go.sum .pre-commit-config.yaml .gitignore */*.yaml Pipfile
Files: *.bib mkdocs.yml .chglog/* .github/* CHANGELOG.md CONTRIBUTING.md Makefile go.mod go.sum .pre-commit-config.yaml .gitignore */*.yaml Pipfile
Copyright: 2019-2020 SAP SE or an SAP affiliate company and project "KB" contributors
License: CC0-1.0
228 changes: 210 additions & 18 deletions README.md

Large diffs are not rendered by default.

188 changes: 143 additions & 45 deletions prospector/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,45 +5,121 @@ currently under development: the instructions below are intended for development

:exclamation: Please note that **Windows is not supported** while WSL and WSL2 are fine.

## Description
## Table of Contents

1. [Description](#description)
2. [Quick Setup & Run](#setup--run)
3. [Development Setup](#development-setup)
4. [Contributing](#contributing)
5. [History](#history)

## 📖 Description

Prospector is a tool to reduce the effort needed to find security fixes for
*known* vulnerabilities in open source software repositories.

Given an advisory expressed in natural language, Prospector processes the commits found in the target source code repository, ranks them based on a set of predefined rules, and produces a report that the user can inspect to determine which commits to retain as the actual fix.

## Setup & Run
## ⚡️ Quick Setup & Run

Prerequisites:

:warning: The tool requires Docker and Docker-compose, as it employes Docker containers for certain functionalities. Make sure you have Docker installed and running before proceeding with the setup and usage of Prospector.
* Docker (make sure you have Docker installed and running before proceeding with the setup)
* Docker-compose

To quickly set up Prospector:
To quickly set up Prospector, follow these steps. This will run Prospector in its containerised version. If you wish to debug or run Prospector's components individually, follow the steps below at [Development Setup](#development-setup).

1. Clone the project KB repository
```
git clone https://github.com/sap/project-kb
```
```
git clone https://github.com/sap/project-kb
```
2. Navigate to the *prospector* folder
```
cd project-kb/prospector
```
cd project-kb/prospector
```
3. Rename the *config-sample.yaml* file in *config.yaml*. <br> Optionally adjust settings such as backend usage, NVD database preference, report format, and more.
```
mv config-sample.yaml config.yaml
```
4. Execute the bash script *run_prospector.sh* specifying the *-h* flag. <br> This will display a list of options that you can use to customize the execution of Prospector.
```
./run_prospector.sh -h
```
The bash script builds and starts the required Docker containers. Once the building step is completed, the script will show the list of available options.
5. Try the following example:
```
./run_prospector.sh CVE-2020-1925 --repository https://github.com/apache/olingo-odata4
```
By default, Prospector saves the results in a HTML file named *prospector-report.html*.
Open this file in a web browser to view what Prospector was able to find!
### 🤖 LLM Support
To use Prospector with LLM support, you simply set required parameters for the API access to the LLM in *config.yaml*. These parameters can vary depending on your choice of provider, please follow what fits your needs (drop-downs below). If you do not want to use LLM support, keep the `llm_service` block in your *config.yaml* file commented out.
<details><summary><b>Use SAP AI CORE SDK</b></summary>
You will need the following parameters in *config.yaml*:
```yaml
llm_service:
type: sap
model_name: <model_name>
temperature: 0.0
ai_core_sk: <file_path>
```

3. Execute the bash script *run_prospector.sh* specifying the *-h* flag. This will display a list of options that you can use to customize the execution of Prospector.
```
./run_prospector.sh -h
`<model_name>` refers to the model names available in the Generative AI Hub in SAP AI Core. You can find an overview of available models on the Generative AI Hub GitHub page.

In `.env`, you must set the deployment URL as an environment variable following this naming convention:
```yaml
<model_name>_URL # model name in capitals, and "-" changed to "_"
```
For example, for gpt-4's deployment URL, set an environment variable called `GPT_4_URL`.

The bash script builds and starts the required Docker containers. Once the building step is completed, the script will show the list of available options.
The `temperature` parameter is optional. The default value is 0.0, but you can change it to something else.

4. Try the following example:
```
./run_prospector.sh CVE-2020-1925 --repository https://github.com/apache/olingo-odata4
```
You also need to point the `ai_core_sk` parameter to a file contianing the secret keys.

</details>

<details><summary><b>Use personal third party provider</b></summary>

Implemented third party providers are **OpenAI**, **Google**, **Mistral**, and **Anthropic**.

1. You will need the following parameters in *config.yaml*:
```yaml
llm_service:
type: third_party
model_name: <model_name>
temperature: 0.0
```
`<model_name>` refers to the model names available, for example `gpt-4o` for OpenAI. You can find a lists of available models here:
1. [OpenAI](https://platform.openai.com/docs/models)
2. [Google](https://ai.google.dev/gemini-api/docs/models/gemini)
3. [Mistral](https://docs.mistral.ai/getting-started/models/)
4. [Anthropic](https://docs.anthropic.com/en/docs/about-claude/models)

By default, Prospector saves the results in a HTML file named *prospector-report.html*.
The `temperature` parameter is optional. The default value is 0.0, but you can change it to something else.

Open this file in a web browser to view what Prospector was able to find!
2. Make sure to add your OpenAI API key to your `.env` file as `[OPENAI|GOOGLE|MISTRAL|ANTHROPIC]_API_KEY`.

## Development Setup
</details>

#### How to use LLM Support for different things

You can set the `use_llm_<...>` parameters in *config.yaml* for fine-grained control over LLM support in various aspects of Prospector's phases. Each `use_llm_<...>` parameter allows you to enable or disable LLM support for a specific aspect:

- **`use_llm_repository_url`**: Choose whether LLMs should be used to obtain the repository URL. When using this option, you can omit the `--repository` flag as a command line argument and run prospector with `./run_prospector.sh CVE-2020-1925`.


## 👩‍💻 Development Setup

Following these steps allows you to run Prospector's components individually: [Backend database and worker containers](#starting-the-backend-database-and-the-job-workers), [RESTful Server](#starting-the-restful-server) for API endpoints, [Prospector CLI](#running-the-cli-version) and [Tests](#testing).

Prerequisites:

Expand All @@ -52,6 +128,8 @@ Prerequisites:
* gcc g++ libffi-dev python3-dev libpq-dev
* Docker & Docker-compose

### General

You can setup everything and install the dependencies by running:
```
make setup
Expand All @@ -66,7 +144,7 @@ Afterwards, you will just have to set the environment variables using the `.env`
set -a; source .env; set +a
```

You can configure prospector from CLI or from the `config.yaml` file. The (recommended) API Keys for Github and the NVD can be configured from the `.env` file (which must then be sourced with `set -a; source .env; set +a`)
You can configure prospector from CLI or from the *config.yaml* file. The (recommended) API Keys for Github and the NVD can be configured from the `.env` file (which must then be sourced with `set -a; source .env; set +a`)

If at any time you wish to use a different version of the python interpreter, beware that the `requirements.txt` file contains the exact versioning for `python 3.10.6`.

Expand All @@ -80,11 +158,13 @@ your editor so that autoformatting is enforced "on save". The pre-commit hook en
black is run prior to committing anyway, but the auto-formatting might save you some time
and avoid frustration.

If you use VSCode, this can be achieved by pasting these lines in your configuration file:
If you use VSCode, this can be achieved by installing the Black Formatter extension and pasting these lines in your configuration file:

```
"python.formatting.provider": "black",
"editor.formatOnSave": true,
```json
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter",
"editor.formatOnSave": true,
}
```

### Starting the backend database and the job workers
Expand All @@ -93,17 +173,23 @@ If you run the client without running the backend you will get a warning and hav

You can then start the necessary containers with the following command:

`make docker-setup`
```bash
make docker-setup
```

This also starts a convenient DB administration tool at http://localhost:8080

If you wish to cleanup docker to run a fresh version of the backend you can run:

`make docker-clean`
```bash
make docker-clean
```

### Starting the RESTful server

`uvicorn api.main:app --reload`
```bash
uvicorn service.main:app --reload
```

Note, that it requires `POSTGRES_USER`, `POSTGRES_HOST`, `POSTGRES_PORT`, `POSTGRES_DBNAME` to be set in the .env file.

Expand All @@ -112,7 +198,9 @@ You might also want to take a look at `http://127.0.0.1:8000/docs`.

*Alternatively*, you can execute the RESTful server explicitly with:

`python api/main.py`
```bash
python api/main.py
```

which is equivalent but more convenient for debugging.

Expand All @@ -126,34 +214,44 @@ Prospector makes use of `pytest`.

:exclamation: **NOTE:** before using it please make sure to have running instances of the backend and the database.

## 🤝 Contributing

If you find a bug, please open an issue. If you can also fix the bug, please
create a pull request (make sure it includes a test case that passes with your correction
but fails without it)

## History
## 🕰️ History

The high-level structure of Prospector follows the approach of its
predecessor FixFinder, which is described in detail here: https://arxiv.org/pdf/2103.13375.pdf
predecessor FixFinder, which is described in:

> Daan Hommersom, Antonino Sabetta, Bonaventura Coppola, Dario Di Nucci, and Damian A. Tamburri. 2024. Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories. ACM Trans. Softw. Eng. Methodol. March 2024. https://doi.org/10.1145/3649590

FixFinder is the prototype developed by Daan Hommersom as part of his thesis
done in partial fulfillment of the requirements for the degree of Master of
Science in Data Science & Entrepreneurship at the Jheronimus Academy of Data
Science during a graduation internship at SAP.

The source code of FixFinder can be obtained by checking out the tag [DAAN_HOMMERSOM_THESIS](https://github.com/SAP/project-kb/releases/tag/DAAN_HOMMERSOM_THESIS).

The main difference between FixFinder and Prospector (which has been implemented from scratch)
is that the former takes a definite data-driven approach and trains a ML model to perform the ranking,
whereas the latter applies hand-crafted rules to assign a relevance score to each candidate commit.

The document that describes FixFinder can be cited as follows:

@misc{hommersom2021mapping,
title = {Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories},
author = {Hommersom, Daan and
Sabetta, Antonino and
Coppola, Bonaventura and
Dario Di Nucci and
Tamburri, Damian A. },
year = {2021},
month = {March},
url = {https://arxiv.org/pdf/2103.13375.pdf}
whereas the latter is based on hand-crafted rules to assign a relevance score to each candidate commit.

Recent versions of Prospector (2024) also use AI/ML; still that is done through suitable rules
that are based on the outcome of suitable requests to LLMs.

The paper that describes FixFinder can be cited as follows:

@article{10.1145/3649590,
author = {Hommersom, Daan and Sabetta, Antonino and Coppola, Bonaventura and Nucci, Dario Di and Tamburri, Damian A.},
title = {Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories},
year = {2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
issn = {1049-331X},
url = {https://doi.org/10.1145/3649590},
doi = {10.1145/3649590},
journal = {ACM Trans. Softw. Eng. Methodol.},
month = {mar},
}
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
class CommitDB:
class BackendDB:
def connect(self, connect_string):
raise NotImplementedError("Unimplemented")
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
import pytest

from commitdb.postgres import PostgresCommitDB, parse_connect_string
from backenddb.postgres import PostgresBackendDB, parse_connect_string
from datamodel.commit import Commit


@pytest.fixture
def setupdb():
db = PostgresCommitDB("postgres", "example", "localhost", "5432", "postgres")
db = PostgresBackendDB("postgres", "example", "localhost", "5432", "postgres")
db.connect()
# db.reset()
return db


def test_save_lookup(setupdb: PostgresCommitDB):
def test_save_lookup(setupdb: PostgresBackendDB):
commit = Commit(
commit_id="42423b2423",
repository="https://fasfasdfasfasd.com/rewrwe/rwer",
Expand All @@ -37,7 +37,7 @@ def test_save_lookup(setupdb: PostgresCommitDB):
assert commit.commit_id == retrieved_commit.commit_id


def test_lookup_nonexisting(setupdb: PostgresCommitDB):
def test_lookup_nonexisting(setupdb: PostgresBackendDB):
result = setupdb.lookup(
"https://fasfasdfasfasd.com/rewrwe/rwer",
"42423b242342423b2423",
Expand Down
Loading

0 comments on commit d78ca86

Please sign in to comment.