Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoiding probe explosion and boilerplate #3824

Closed
spencerschrock opened this issue Jan 27, 2024 · 2 comments · Fixed by #3981
Closed

Avoiding probe explosion and boilerplate #3824

spencerschrock opened this issue Jan 27, 2024 · 2 comments · Fixed by #3981

Comments

@spencerschrock
Copy link
Member

Background

When evaluating a repository, it’s common for Scorecard to detect different tools when scoring checks or probes. The heuristic has to do with the usage of tools from certain categories, not the usage of an exact tool. This applies to many checks (and their underlying probes):

  • Fuzzing
    • OSS-Fuzz / Clusterfuzz lite
    • Various property-based fuzzers for each of the supported languages: Go, Haskell, JS/TS, Python, C, C++, Swift, Rust, Java
  • SAST
    • CodeQL PR stats (this one has some extra logic)
    • Tool config files
      • CodeQL, Pysa, Qodana, Snyk, Sonar
  • Dependency-Update-Tool
    • Dependabot, PyUp, Renovate

Proposal

We frame probes as answers to various questions about a repository, so this proposal boils down to figuring out what the question is:

  1. Does the project use a tool from category X?
  2. Does the project use tool Y from category X? (asked for each supported tool)

I’m going to argue we should favor the first question. Specifically, there should be a single probe per tool category, not per tool.

Pros

  • Avoids boilerplate
    • Each implementation includes about 250 lines of documentation, code, and tests. Of which very little differs
      • Probe YAML definition (~50 lines)
        • Slight differences in description and remediation
      • Probe implementation and tests (~200 lines)
        • As little as 1 changed difference, significant code duplication
  • Less fragile tests
    • UniqueProbesEqual is called by most evaluation code to make sure we have all the probes we expect. So adding a new tool probe requires adding the new probe to all the other tests, or the test will throw errors.
  • Better warnings
    • Users don’t need to get warned for not using every tool. Part of this is Scorecard’s implementation/consumption of probes, and could be alleviated by some of the policy work. But for now, users that don’t use 1 fuzzer get warned they don't use every fuzzer.

Cons

  • Remediation becomes less specific. But we can solve it with another layer of indirection.
    E.g. check out our [link to some markdown] for instructions for specific tools for your language.
  • Certain policies may become more difficult to write? Anyone that cares about a specific SAST/Fuzzing/DUT tool will need to check one more “tool” field.

Example

Current

{"probe": "fuzzedWithGoNative", "outcome": "Negative", ...}
{"probe": "fuzzedWithRustCargofuzz","outcome": "Negative",...}
...
{"probe": "fuzzedWithPropertyBasedTypescript","outcome": "Positive",...}
{"probe": "fuzzedWithGoNative","outcome": "Positive",...}

Proposed

{ "probe": "fuzzed","outcome": "Positive", "values": {"tool": "QuickCheck"}, ...}
{ "probe": "fuzzed","outcome": "Positive", "values": {"tool": "GoNative"}, ...}

Why aren’t things like this already?

Short answer: a limitation in the Finding struct.

type Finding struct {
	Location    *Location 
	Remediation *probe.Remediation
	Values      map[string]int
	Probe       string
	Message     string
	Outcome     Outcome
}

Specifically, let’s take a look at the Values map. It maps some property/label to an integer value. This typing was chosen with a small subset of probes in mind, and was always subject to change.

  • SAST
    • Proportion of PRs analyzed (two key -> int labels)
  • Maintained
    • Lookback window, 90 days
    • Issues in threshold
    • Commits in threshold

But the integer type also provides limitations, and probes converted after the Values map was typed have to resort to workarounds

  • Branch Protection
    • Use the key field for branch name, and the value isn’t important
  • CII-Best-Practices
    • The badge level is a string, so again we have the key name being the important part, and the value being discarded
  • Token-Permissions (old)
    • There’s an intermediate step converting string enums to/from an int. But on its own, the values are opaque to consumers.

Enabling Change

One option is making the Values type map[string]string, as strings are more flexible. Integers can be represented as a string so we don’t lose anything. Note: this does introduce an extra step for probes which want int values.

type Finding struct {
	Location    *Location 
	Remediation *probe.Remediation
	Values      map[string]string
	Probe       string
	Message     string
	Outcome     Outcome
}

Examples

CII-Best-Practices
Before:

{ 
  "probe": "hasOpenSSFBadge",
 "outcome": "Positive", 
  "values": {"Gold": 1}
}

After:

{ 
  "probe": "hasOpenSSFBadge",
  "outcome": "Positive", 
  "values": {"level": "Gold"}
}

Maintained:
Before:

{ 
  "probe": "hasRecentCommits",
 "outcome": "Positive", 
  "values": {"commitsWithinThreshold": 30, "lookBackDays": 90}
}

After:

{ 
  "probe": "hasRecentCommits",
 "outcome": "Positive", 
  "values": {"commitsWithinThreshold": "30", "lookBackDays": "90"}
}
@pnacht
Copy link
Contributor

pnacht commented Jan 29, 2024

Big fan of this idea.

Moving from ints to strings is a huge improvement, for sure. As we briefly discussed in #3816 (comment), I'd love if it were possible to make the map more generic (literally!), but after spending much of Friday learning how generics and interfaces actually work in Go, I've come to realize it simply isn't possible in this situation.

So using strings seems reasonable. The evaluation logic for many checks will need to convert back into other types (i.e. ints) to make comparisons, but that should be simple enough.

@AdamKorcz
Copy link
Contributor

AdamKorcz commented Jan 31, 2024

I like these two changes (reframing the probes and changing the Values map).

Having a string (instead of an interface or generic) is preferred IMO, but I may have a bit of bias from dealing with a map of interfaces in the past: It tends to often get messy and increase maintainability. A map[string]string will solve the use of Values for all probes as the probes look right now. Currently, the big limitation is that we can only specify int values as @spencerschrock writes, and we have had the need to use string values too. A map[string]string map solves the need for both int's and string's. Currently, I can't remember the need for any other values besides int's and string's. Therefore, as I see it, a map[string]string map will tidy up Scorecards probes.

On the question of reframing the probes, I can't see a negative impact from this. Scorecards own evaluation would be able to process the probes more or less in the same manner, and other users that write their own evaluation should not see increased complexity from structuring the probes in the way that @spencerschrock suggests. @spencerschrock 's suggestion will definitely reduce Scorecards LoC and a fair bit of repeated code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants