Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoadVulnerabilityDB could be faster with ValidateByHashOnGet #1502

Closed
jonjohnsonjr opened this issue Sep 18, 2023 · 3 comments · Fixed by #2054
Closed

LoadVulnerabilityDB could be faster with ValidateByHashOnGet #1502

jonjohnsonjr opened this issue Sep 18, 2023 · 3 comments · Fixed by #2054
Labels
bug Something isn't working

Comments

@jonjohnsonjr
Copy link

What happened:

Calling LoadVulnerabilityDB with ValidateByHashOnGet set to true ends up hashing the DB twice.

What you expected to happen:

We would hash the DB once.

How to reproduce it (as minimally and precisely as possible):

diff --git a/cmd/grype/cli/options/database.go b/cmd/grype/cli/options/database.go
index 83fd177..3563e6d 100644
--- a/cmd/grype/cli/options/database.go
+++ b/cmd/grype/cli/options/database.go
@@ -29,6 +29,8 @@ func DefaultDatabase(id clio.Identification) Database {
                ValidateAge: true,
                // After this period (5 days) the db data is considered stale
                MaxAllowedBuiltAge: time.Hour * 24 * 5,
+               // Repro:
+               ValidateByHashOnStart: true,
        }
 }

Then run grype. It is slower than you would expect because it's hashing the DB twice:

image

Anything else we need to know?:

This ends up hashing the whole DB:

storeReader, dbCloser, err := dbCurator.GetStore()
if err != nil {
return nil, nil, nil, err
}

This also ends up hashing the whole DB:
status := dbCurator.Status()

Also, this could probably be even faster if HashFile used io.CopyBuffer or a bufio.Reader with a larger buffer than the default used by io.Copy.

Since we are calling LoadVulnerabilityDB as a library, we are probably hitting this where nobody else would (via the grype cli), and we don't even look at the Status that gets returned, so perhaps we should just reimplement what LoadVulnerabilityDB is doing, but figured it would be good to file an issue.

Environment:

  • Output of grype version:
Application:         grype
Version:             0.68.1
BuildDate:           2023-09-15T18:35:30Z
GitCommit:           brew
GitDescription:      [not provided]
Platform:            darwin/arm64
GoVersion:           go1.21.1
Compiler:            gc
Syft Version:        v0.90.0
Supported DB Schema: 5
  • OS (e.g: cat /etc/os-release or similar):
$ sw_vers
ProductName:		macOS
ProductVersion:		13.5.2
BuildVersion:		22G91
@jonjohnsonjr jonjohnsonjr added the bug Something isn't working label Sep 18, 2023
@wagoodman wagoodman moved this to Backlog in OSS Dec 7, 2023
@tgerla
Copy link
Contributor

tgerla commented Dec 7, 2023

Hi @jonjohnsonjr, thank you for the report and sorry it's taken us a while to get back to you! We will put this in the backlog to look at as soon as we are able. We would also be happy to review a pull request if you had a chance to work on it. Thanks again.

@luhring
Copy link
Contributor

luhring commented May 2, 2024

Starting to hit this, too — I'm finding that it takes at least ~1.6s to run just the LoadVulnerabilityDB function, and that's without downloading a new database.

I don't mind opening a PR, but I'm curious what's allowed to change about what this function does, since it's juggling a few different things at once

@lucasrod16
Copy link
Contributor

I’ve opened a PR to address this issue #2054

Any feedback or suggestions would be appreciated. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants