-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use concurrency for path scanning #405
Conversation
@@ -80,7 +80,7 @@ func programKind(ctx context.Context, path string) string { | |||
headerString := "" | |||
n, err := io.ReadFull(f, header[:]) | |||
if err == nil || errors.Is(err, io.ErrUnexpectedEOF) { | |||
kind, err := magic.Lookup(header[:n]) | |||
kind, err := magic.LookupSync(header[:n]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt we were seeing much of a benefit from this.
LookupSync lookups up the file type based on the provided magic bytes without spawning any additional goroutines. You should provide at least the first 1024 bytes of the file in this slice. A magic.ErrUnknown will be returned if the file type is not known.
Signed-off-by: egibs <[email protected]>
…overed paths Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
d7e0729
to
42d2cda
Compare
@@ -127,7 +128,7 @@ func processDest(ctx context.Context, c bincapz.Config, from, to map[string]*bin | |||
for relPath, tr := range to { | |||
fr, exists := from[relPath] | |||
if !exists { | |||
d.Added[relPath] = tr | |||
d.Added.Set(relPath, tr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the pattern for adding a KV pair to an orderedmap
dictionary.
delete(d.Removed, rpath) | ||
delete(d.Added, apath) | ||
d.Modified.Set(apath, abs) | ||
d.Modified.Delete(rpath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the pattern for deleting a key from an orderedmap
.
@@ -152,7 +159,7 @@ func recursiveScan(ctx context.Context, c bincapz.Config) (*bincapz.Report, erro | |||
logger := clog.FromContext(ctx) | |||
logger.Debug("recursive scan", slog.Any("config", c)) | |||
r := &bincapz.Report{ | |||
Files: map[string]*bincapz.FileReport{}, | |||
Files: orderedmap.New[string, *bincapz.FileReport](), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the pattern for initializing a new orderedmap
.
pkg/action/util.go
Outdated
@@ -0,0 +1,24 @@ | |||
// Copyright 2024 Chainguard, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are helper functions for inserting sorted keys into a map.
pkg/action/diff.go
Outdated
d.Modified[relPath].Behaviors = append(d.Modified[relPath].Behaviors, abs.Behaviors...) | ||
rel, _ := d.Modified.Get(relPath) | ||
rel.Behaviors = append(rel.Behaviors, abs.Behaviors...) | ||
d.Modified.Set(relPath, rel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the most difficult portion of migrating to orderedmap
. I don't think a one-liner was doable but this is more readable.
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
3b10a15
to
aa6bed2
Compare
…tion loop Signed-off-by: egibs <[email protected]>
aa6bed2
to
8b05ed4
Compare
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid improvement, I just have some readability comments regarding data structures.
pkg/action/scan.go
Outdated
fr, err := processFile(ctx, c, yrs, path, scanPath, trimPath, logger) | ||
if err != nil { | ||
mu.Lock() | ||
result = append(result, findings{path: path, fr: &bincapz.FileReport{}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would improve readability if a sync.Map
was used to store the findings, so you wouldn't have to worry about manually managing the mutex. The slice ends up being converted to a map anyways later via scanPathFindings
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this out and got some weird results when running tests where the file reports were merged for two different tests. Managing the mutex(es) isn't ideal but the outcome is deterministic. I'll see if I was doing something wrong with the sync.Map
, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I was doing something wrong. I'll push the working code in a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 81fca83
(#405).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up removals/fixes in c03d232
(#405) and c4888b3
(#405).
Signed-off-by: egibs <[email protected]>
Signed-off-by: Evan Gibler <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Love it. Thank you! |
* Use file descriptors Signed-off-by: egibs <[email protected]> * Relocate file descriptor code; implement working concurrency for discovered paths Signed-off-by: egibs <[email protected]> * Move to ordered map; relocate File rendering Signed-off-by: egibs <[email protected]> * Limit concurrency to runtime.NumCPU() Signed-off-by: egibs <[email protected]> * Rename map variables Signed-off-by: egibs <[email protected]> * Directly initialize diff maps Signed-off-by: egibs <[email protected]> * Appease the linter Signed-off-by: egibs <[email protected]> * Configurable concurrency Signed-off-by: egibs <[email protected]> * Update samples_test.go Signed-off-by: egibs <[email protected]> * More consolidation Signed-off-by: egibs <[email protected]> * Add exists check Signed-off-by: egibs <[email protected]> * Use -j instead of -c for concurrency Signed-off-by: egibs <[email protected]> * Use an errgroup instead of a waitgroup + semaphore Signed-off-by: egibs <[email protected]> * Use SetLimit; add concurrency config to all tests; linting fixes Signed-off-by: egibs <[email protected]> * Move sorting code into scan.go; move rendering inside of map construction loop Signed-off-by: egibs <[email protected]> * Use 8-core runner for tests Signed-off-by: egibs <[email protected]> * Go back to ubuntu-latest Signed-off-by: egibs <[email protected]> * Run core tests in parallel Signed-off-by: egibs <[email protected]> * Better alias naming, consolidate structs Signed-off-by: egibs <[email protected]> * Run make fix Signed-off-by: egibs <[email protected]> * Replace mutexes with sync.Map Signed-off-by: egibs <[email protected]> * Remove struct/type Signed-off-by: egibs <[email protected]> * Appease the linter Signed-off-by: egibs <[email protected]> --------- Signed-off-by: egibs <[email protected]> Signed-off-by: Evan Gibler <[email protected]>
As opposed to #402 which implemented concurrency at the
scanPath
level, this PR implements configurable concurrency at thepath
level, that is, all of the recursively discovered paths within a givenscanPath
.By default, the concurrency is limited to
runtime.NumCPU()
and will automatically be set to1
if the configured concurrency is less than1
. Similarly, the concurrency can not be set higher than the number of paths being scanned.I converted
map[any]any
usages over togo-ordered-map
to ensure that all of our dictionaries maintain their insertion order when iterated over.Anecdotally, this change has provided a decent increase in speed depending on the concurrency value, number of files, and the size of each file.
Benchmarking with the default concurrency of
runtime.NumCPU()
takes ~50 seconds:as opposed to ~80 seconds without concurrency (
-j 1
):Anecdotally, scanning
/usr/bin
on my Framework Laptop takes ~60 seconds with concurrency and ~220 seconds without (-j 1
).