Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Playground PR for verification caching exploration #3457

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

rosecodym
Copy link
Collaborator

@rosecodym rosecodym commented Oct 17, 2024

Description:

This is a playground branch I made to play around with some ideas for introducing verification caching cleanly. Please don't merge it! Ever!

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

Copy link
Collaborator

@mcastorina mcastorina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never merge this

@rosecodym rosecodym changed the title Playground PR for detection caching exploration Playground PR for verification caching exploration Oct 17, 2024
r.CopyVerificationInfo(cacheHit)
} else {
isEverythingCached = false
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was playing around converting this to slices.ContainsFunc, but then I realized maybe it's not optimal to break here? If we run into something that isn't yet cached, everything after it won't be updated from cache. Maybe that's fine though?

(also dunno what I think about the Func functions in slices; I guess the main use case is to not dupe logic in loops, but that doesn't apply here)

Copy link
Collaborator Author

@rosecodym rosecodym Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is just kinda unfortunate. If we find something that's not cached, we have to rerun FromData, which will return all the results for this chunk - whether they're cached or not. So there's no reason to read from the cache. (Did I successfully explain that?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there's no reason to read from the cache. (Did I successfully explain that?)

Essentially, this is a stopgap solution to handle duplicate chunks (e.g., scanning orgs + members that have multiple forks of kubernetes/kubernetes.)

Proper caching on a granular level will require splitting FromData and Verify into discrete functions.

// pseudo-code
cache := cache.New()

for _, chunk := range chunks {
    for _, d := range detectors {
        results := d.FromData(ctx, chunk)
        
        for result := range results {
            if val, ok := cache.Get(detector.Type, result.Raw); ok {
                result.Update(val)
            } else {
                detector.Verify(result)
                cache.Set(detector.Type, result.Raw)
            }
        }
    }
}

rosecodym added a commit that referenced this pull request Oct 28, 2024
Result.DecoderType is only ever used by ResultWithMetadata (via its embedded Result). This unnecessarily complicates the relationship between the types and adds some warts to #3457, so this PR moves DecoderType directly into the only struct which actually uses it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants