Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: re-visit outcome definition in findings #2928

Open
laurentsimon opened this issue Apr 28, 2023 · 10 comments
Open

Feature: re-visit outcome definition in findings #2928

laurentsimon opened this issue Apr 28, 2023 · 10 comments
Labels
kind/enhancement New feature or request

Comments

@laurentsimon
Copy link
Contributor

laurentsimon commented Apr 28, 2023

  1. Use iota or explicit values
  2. Do we need a NotApplicable outcome?
@spencerschrock
Copy link
Member

From a discussion: Should probes have a numeric value?

e.g number of contributors for the contributor check. or number of minimum reviewers in code review check.

@laurentsimon
Copy link
Contributor Author

/cc @AdamKorcz

@AdamKorcz
Copy link
Contributor

AdamKorcz commented Oct 3, 2023

From a discussion: Should probes have a numeric value?

e.g number of contributors for the contributor check. or number of minimum reviewers in code review check.

The checks that do proportional scoring will benefit from this, as the corresponding probes will need to return a numeric value. The checks that do proportional scoring are:

  1. CI Tests
  2. Code review
  3. Contributors
  4. Maintained
  5. Pinned Dependencies
  6. Webhooks

In most of the cases, the probes can handle this by themselves. For example. they can return the number of findings corresponding to a numeric value, however it does increase complexity in the probes, especially regarding understanding the probe outcomes; Some probes can return the number of findings, some probes can only return the number of findings in certain cases, other probes will not return the number of cases. A numeric field in the Outcome would in most cases replace a scenario where the number of finding outcomes is used as the numeric value. As such, which scenario is better: Using the number of findings as the numeric value or using a field in the Outcome?

A related question is how we should use the Outcome vs the numeric value across all probes. Why have OutcomePositive and OutcomeNegative at all if the probe can return numeric values? The purpose of a numeric field is to help in cases where a Positive/Negative outcome is not enough, however, not all developers would take that approach to begin with. For consistency across all probes, it would be good with documentation on the numeric values and that developers should aim to solve a given probe first by using the Outcome value. The documentation could make clear what the "ideal" outcome looks from a probe that returns numeric values.

Do we need a NotApplicable outcome?

From a high level, this is useful for probes that consider whether one/how many of SOMETHING is/are a security risk or not. Two checks that I can think of do that:

  1. Webhooks.
  2. Dangerous workflows.

If a project has either no webhooks or workflows, Scorecard cannot evaluate whether any of them are a risk. I could imagine that there can be more of these in the future, whether publicly maintained or not; For example, some users might want to check integrations with 3rd-party apps - for the sake of argument let's say an integration with Slack: A scorecard check could be whether this integration is set up securely or insecurely, however, you can only evaluate it if the project actually has a Slack integration.

Another example could be fuzzing: Scorecard could consider it positive if non-security findings from OSS-Fuzz are auto-created as a Github issue, however, if the project is not on Github, then Scorecard cannot evaluate this.

This could also be relevant to specific types of projects. For example, whether cloud-native projects have configured their .yaml files securely or not.

@laurentsimon
Copy link
Contributor Author

laurentsimon commented Oct 3, 2023

Following up on the discussion we had this morning:

Numerical values

Sounds useful. Maybe we define an outcome structure like:

type Outcome Struct {
   Type enum {Int, Bool}
   Value interface{}
}

The thing to keep in mind is that we will ultimately allow end-users to define their own checks using a (hopefully simple) config file. See an example https://github.com/laurentsimon/scorecard/blob/mvp/flexible-checks/evaluation/policy.yml If we can keep the evaluation engine simple, it's better.

I think @AdamKorcz will write up a short proposal about what he thinks is an appropriate solution.

NotApplicable

Not a blocker, there was consensus among the maintainers that this is useful

Am I missing anything from today's conversation?

@AdamKorcz
Copy link
Contributor

This is in my opinion the optimal way to add the option for numeric return values from probes. I am keeping this text short and am focusing on the details of what I see as the best solution, and am not including trade offs between different types of solutions. Please let me know if I should add more context.

I consider finding.Finding to a suitable place to add this option. Adding it there has little overhead and requires little rewriting of most uses of finding.Finding. The new finding.Finding would look like this:

type Finding struct {
	Probe       string             `json:"probe"`
	Outcome     Outcome            `json:"outcome"`
	Message     string             `json:"message"`
	Location    *Location          `json:"location,omitempty"`
	Remediation *probe.Remediation `json:"remediation,omitempty"`
        Values
}

I consider the best data structure to be a map[string]int. The string key allows us to have good readability when evaluating the numeric values. The int value allows us to easily compare it against other values in custom checks (in .yaml files) and in the Scorecard code.

I don’t see a need for creating nested maps now or in the near future.

As such, the complete, changed finding.Finding would look like this:

type Finding struct {
	Probe       string             `json:"probe"`
	Outcome     Outcome            `json:"outcome"`
	Message     string             `json:"message"`
	Location    *Location          `json:"location,omitempty"`
	Remediation *probe.Remediation `json:"remediation,omitempty"`
        Values       map[string]int    `json:”values,omitempty”`
}

With this change, there is no need for returning multiple findings from each probe. By adding the option to return a numeric value, we should make the following high-level changes to the existing (already merged) probes:

  • Change the return value of probes to be finding.Finding instead of []finding.Finding.
  • All fuzzing probes: Add a field the to Values map called “number of fuzzers” and return a single finding.
  • securityPolicyContainsLinks: Add two fieldsthe Values map : 1) Number of emails and 2) Number of URLs and return a single finding.
  • securityPolicyContainsText: Add a KV pair to the Values map called “number of policies” and return a single finding.
  • securityPolicyContainsVulnerabilityDisclosure: Add a KV pair to the Values map called “number of discvuls” and return a single finding.
  • securityPolicyPresent: Add a field the VALUES map called “number of security policies” and return a single finding.

Please scrutinize this suggestion.

@spencerschrock
Copy link
Member

With this change, there is no need for returning multiple findings from each probe. By adding the option to return a numeric value, we should make the following high-level changes to the existing (already merged) probes:

  • Change the return value of probes to be finding.Finding instead of []finding.Finding.

I think there's still value in multiple findings, namely because their location differs. Consider any of the checks that evaluate file contents. There can be more than one dangerous workflow, unpinned dependency, etc.

@AdamKorcz
Copy link
Contributor

With this change, there is no need for returning multiple findings from each probe. By adding the option to return a numeric value, we should make the following high-level changes to the existing (already merged) probes:

  • Change the return value of probes to be finding.Finding instead of []finding.Finding.

I think there's still value in multiple findings, namely because their location differs. Consider any of the checks that evaluate file contents. There can be more than one dangerous workflow, unpinned dependency, etc.

Fair point, but wouldn't the number of dangerous workflows and unpinned dependencies be placed in the numeric fields, ie. the map?

@spencerschrock
Copy link
Member

Yes, but I think there's value in being able to point it to a location in the repo. When done properly, this location data lets us feed it into the Code Scanning dashboard (or PR feedback).

Screenshot 2023-10-09 at 1 42 37 PM

@spencerschrock
Copy link
Member

There seems to be no way to do debug messages with findings? Is this something we are expecting? Should all debug logging be done at the raw results level?

One discussion was at https://github.com/ossf/scorecard/pull/3486/files#r1366854736

@spencerschrock
Copy link
Member

In #2919 (comment), Laurent declared Outcome as an integer type as a way of comparing outcomes. I synced with him offline recently and that's not something we're envisioning anymore. Plus as he points out in that comment, comparing the other outcomes (e.g. OutcomeError, OutcomeNotAvailable) is complicated.

Changing the outcome type from int to string would make the current results more consumer friendly:
before:

{
"probe": "hasOSVVulnerabilities",
"message": "Project does not contain OSV vulnerabilities",
"outcome": 12
}

After

{
"probe": "hasOSVVulnerabilities",
"message": "Project does not contain OSV vulnerabilities",
"outcome": "Positive"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request
Projects
Status: Todo
Development

No branches or pull requests

5 participants