Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export vulnerablecode-data #1206

Merged
merged 1 commit into from
Aug 6, 2024
Merged

Conversation

ziadhany
Copy link
Collaborator

@ziadhany ziadhany commented May 31, 2023

This PR exports VulnerableCode data in a file system structure suitable for use in FederatedCode.

@ziadhany
Copy link
Collaborator Author

ziadhany commented Jun 2, 2023

This is how the data look like
with path like this /home/ziad/vulnerablecode-data/pypi/django/VCID-rf6e-vjeu-aaae.json

{
    "vulnerability_id": "VCID-rf6e-vjeu-aaae",
    "aliases": [
        "CVE-2022-22818",
        "GHSA-95rw-fx8r-36v6"
    ],
    "summary": "Cross-site Scripting in Django",
    "affected_purls": [
        "pkg:pypi/[email protected]",
         .......
        "pkg:pypi/[email protected]",
          ......
        "pkg:pypi/[email protected]"
    ],
    "fixed_purl": [
        "pkg:pypi/[email protected]",
        "pkg:pypi/[email protected]",
        "pkg:pypi/[email protected]"
    ],
    "severities": [
        {
            "id": 25302,
            "reference_id": 166932,
            "scoring_system": "cvssv3.1_qr",
            "value": "MODERATE",
            "scoring_elements": ""
        }
    ],
    "references": [
        {
            "id": 164962,
            "url": "https://docs.djangoproject.com/en/4.0/releases/security/",
            "reference_id": ""
        },
       .......
        {
            "id": 166932,
            "url": "https://github.com/advisories/GHSA-95rw-fx8r-36v6",
            "reference_id": "GHSA-95rw-fx8r-36v6"
        }
    ],
    "weaknesses": []
}

and what should I do if vulnerability don't have any related package ?

@ziadhany
Copy link
Collaborator Author

ziadhany commented Jun 5, 2023

All these vulnerabilities don't have any related packages and it is old and not open source like you said @pombredanne
ignore.txt

@TG1999
Copy link
Contributor

TG1999 commented Jun 20, 2023

@ziadhany LGTM! please add some unit tests for same

@ziadhany
Copy link
Collaborator Author

@ziadhany LGTM! please add some unit tests for same

Done , @TG1999 have a look at the tests and Lmk if I need to add more tests

@ziadhany
Copy link
Collaborator Author

@pombredanne @TG1999 can you suggest a way to improve the performance ?

@tdruez
Copy link
Contributor

tdruez commented Oct 17, 2023

@ziadhany For a data dump type of export, I would suggest simplifying the data structure by handling each model separately. Trying to load all relationships at once is likely to provide poor performance.

You can look into the Django build-in dumpdata management command at https://docs.djangoproject.com/en/4.2/ref/django-admin/#dumpdata

@ziadhany
Copy link
Collaborator Author

@ziadhany For a data dump type of export, I would suggest simplifying the data structure by handling each model separately. Trying to load all relationships at once is likely to provide poor performance.

You can look into the Django build-in dumpdata management command at https://docs.djangoproject.com/en/4.2/ref/django-admin/#dumpdata

I tried to use django dumpdata but I don't think this could work in this task. so I tried to use .prefetch_related("vulnerabilities") to load the relationships but the script is still slow compared to dumpdata

@ziadhany ziadhany requested a review from TG1999 March 23, 2024 21:52
@ziadhany
Copy link
Collaborator Author

ziadhany commented Apr 9, 2024

Using prefetching makes performance worse. Maybe I'm using it in the wrong way.
there is a lot of query duplication and just 10 loops take more than 2129.20 ms without writing any file on the disk

image
VulnerableCode Home.zip

@TG1999
Copy link
Contributor

TG1999 commented Jul 22, 2024

@ziadhany what's pending on this ?

@ziadhany
Copy link
Collaborator Author

@ziadhany what's pending on this ?

yes, this PR is ready to be merged.

@TG1999
Copy link
Contributor

TG1999 commented Aug 6, 2024

@ziadhany please see, tests are failing

Fix disk storage structure
Redefine the disk storage structure
Add a test for write_vul_data
Rename file extension from yaml to yml again
Add Filter before prefetch_related
Add paginated again
Fix typo in export and rename files
from yaml to yml
Fix filename error , Remove / from filename
Create a query for distinct ecosystems
Try to improve export performance again
Try to improve export performance by load all data in memory before start writing on disk
Improve export vulnerablecode data performance
Try to improve export performance
Try to improve performance by adding pagination
Fix filename for export files
Add multiple parameterizes for create_sub_path test .
Add new format for exporting vulnerablecode-data
Add a test
Fix export test with yaml format
Change the export format from json to yaml
Add test for export command
Add test for write_vuln_data function
Edit export.py , Fix missing attribute in vuln_data
Export vulnerablecode-data
Add new format for exporting vulnerablecode-data
Add a test
Fix export test with yaml format
Change the export format from json to yaml
Add test for export command
Add test for write_vuln_data function
Edit export.py , Fix missing attribute in vuln_data
Export vulnerablecode-data
Add new format for exporting vulnerablecode-data
Add a test
Fix export test with yaml format
Change the export format from json to yaml
Add test for export command
Add test for write_vuln_data function
Edit export.py , Fix missing attribute in vuln_data
Export vulnerablecode-data

Signed-off-by: ziadhany <[email protected]>
@ziadhany
Copy link
Collaborator Author

ziadhany commented Aug 6, 2024

@ziadhany please see, tests are failing

@TG1999 Done! Could you please review and approve so we can merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants