Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement verification cache #3801

Merged
merged 47 commits into from
Dec 20, 2024
Merged

Implement verification cache #3801

merged 47 commits into from
Dec 20, 2024

Conversation

rosecodym
Copy link
Collaborator

@rosecodym rosecodym commented Dec 19, 2024

Description:

This PR introduces a cache that allows the scanner to avoid emitting multiple requests to verify the same credential. In practice, it doesn't seem to reduce scan time at all, but it does seem to reduce the number of calls to FromData rather drastically.

The cache is implemented as an opt-out feature that can be disabled with a new CLI flag. If we don't like this, we can change it.

The metrics collection hopefully isn't too architecture-astronauty; I wanted to create something useful here that could also accommodate future Prometheus configuration without making the implementation all stupid.

Here's the result of scanning rails a few times, with and without caching:

with caching
{"chunks": 497160, "bytes": 198670727, "verified_secrets": 0, "unverified_secrets": 100, "scan_duration": "29.301196125s", "trufflehog_version": "dev", "verification_caching": {"Hits":50,"Misses":60,"HitsWasted":0,"AttemptsSaved":50,"VerificationTimeSpentMS":48020}}
{"chunks": 497160, "bytes": 198670727, "verified_secrets": 0, "unverified_secrets": 100, "scan_duration": "27.765202709s", "trufflehog_version": "dev", "verification_caching": {"Hits":50,"Misses":60,"HitsWasted":0,"AttemptsSaved":50,"VerificationTimeSpentMS":41405}}

without caching
{"chunks": 497160, "bytes": 198670727, "verified_secrets": 0, "unverified_secrets": 100, "scan_duration": "28.843573s", "trufflehog_version": "dev", "verification_caching": {"Hits":0,"Misses":0,"HitsWasted":0,"AttemptsSaved":0,"VerificationTimeSpentMS":56733}}
{"chunks": 497160, "bytes": 198670727, "verified_secrets": 0, "unverified_secrets": 100, "scan_duration": "27.674744458s", "trufflehog_version": "dev", "verification_caching": {"Hits":0,"Misses":0,"HitsWasted":0,"AttemptsSaved":0,"VerificationTimeSpentMS":53908}}

The caching doesn't speed the scans up, which is expected, because scans aren't generally bottlenecked by verification. However, it does reduce the number of verification requests, which is good for avoiding account lockouts and stuff like that.

Cached results get this cool new text:
image

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?
  • Double-check names and comments and suchlike
  • Put some metrics in this PR description

@rosecodym rosecodym requested review from a team as code owners December 19, 2024 01:41
@rosecodym rosecodym marked this pull request as draft December 19, 2024 23:33
@rosecodym rosecodym marked this pull request as ready for review December 19, 2024 23:54
@ankushgoel27
Copy link
Contributor

i understand it doesnt verify the same credential but I hope it does report it as verified(if verified) in the output.

@rosecodym
Copy link
Collaborator Author

rosecodym commented Dec 20, 2024

i understand it doesnt verify the same credential but I hope it does report it as verified(if verified) in the output.

Yep, that's the idea! If the credential is cached as verified, then the output will report it as verified:

image


// MetricsReporter is an interface used by a verification cache to report various metrics related to its operation.
// Implementations must be thread-safe.
type MetricsReporter interface {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: @rosecodym any reason we need this to be an interface? I'm seeing InMemoryMetrics as the only struct that implements this interface.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good question. The plan is to use this seam to plug in a Prometheus-based reporter in enterprise. Rather than drape those metrics all over the OSS codebase I opted to create this interface, which looks silly on its own (as you've pointed out) but I hope will save some headaches in the long run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have prometheus metrics in the OSS codebase elsewhere, it doesn't cause any issues.

Copy link
Collaborator

@zricethezav zricethezav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just one non-blocking question

@rosecodym rosecodym merged commit ddc015e into main Dec 20, 2024
13 checks passed
@rosecodym rosecodym deleted the detection-caching branch December 20, 2024 21:40
@mcastorina
Copy link
Collaborator

This looks great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants