Adding SARIF support #43

shaopeng-gh · 2022-04-20T00:11:25Z

Use Case

Adding SARIF support to the linter output.

Additional Context

For general information about SARIF: https://sarifweb.azurewebsites.net/

Describe the Solution You Would Like

Hello! We are working on adding SARIF support for puppet-lint as in ongoing PR #40

A proper SARIF requires a few more additional information in the tool,
In addition to the current changes to support basic SARIF, we are proposing a few changes, these proposed changes are not yet in the PR, we would like to confirm with repo maintainer first.

SARIF supports an opaque, stable id for every check, and a friendly readable name. The readable name is useful to communicate information to users in the IDE, etc. I can see the tool currently has the “check” field which can be used as a rule name e.g. “variable_contains_dash”. If you like, we can propose stable rule ids for your checks, e.g., ‘PL1001’, ‘PL1002’, etc.

(detail changes: go through all the plugins add a new property check_id => "PLXXXX", and change the code so that it can be passed to the result SARIF file as rule id)

The SARIF recommendation for friendly rule names is to use Pascal-Casing, e.g., ‘VariableContainsDash’ rather than ‘variable_contains_dash’. If you’d like that change, I can make it. The benefit is that your rule names will be more consistent with others when aggregated tool results are displayed in a viewer. There’s an argument for leaving things as is with an existing tool, however, and many scripting linters in particular use the convention you current have.

(detail changes: we do not suggest to change your rule/check name since it is already being used to do the check suppressions and likely other usages, we should probably keep it as it is currently to avoid breaking users. This item is as fyi.)

SARIF recommends specifying the entire region of problematic code and we’re currently missing the line-end and column-end data. For this case, the implied region begins with the starting column and extends through the current line. I propose to add them to your results but will need to figure out how to get this information for each plugin.

(detail changes: for each plugin output, add :lineEnd and :columnEnd side by side with the current :line and :column)

SARIF recommends injecting the literal, precise snippet of text that is regarded as problematic into the log. Doing this maximizes the ability of a result matching API to track issues run-over-run when there is code drift (i.e., the literal line locations may have changed, but the precise textual match in a baseline log may be useful to find the current location of the problem.

(detail changes: If we add the ending line and column mentioned in No.3, we should be able to retrieve the code snippet using the beginning and end.)

SARIF also recommends adding a ‘context’ region to the log file, which is text that includes one or two lines above and past the literal vulnerability. Adding this data maximizes the ability of a web viewer to provide a good display of the problem (even when an enlistment isn’t available). This context region also turns out to be very helpful for fingerprinting scenarios, the hash of this data can be very stable run-over-run even as code changes.

(detail changes: If we add the ending line and column mentioned in No.3, we should be able to retrieve the code snippet using the beginning line -1 and ending line + 1. )

SARIF recommends populating logs with format string in the rules table and that each result specify the arguments to be injected into that string (rather than creating a fully formed user-facing string for each result). This is mostly done in order to minimize log file size, as it avoids repeated static content. Taking this approach also encourages tools to actually emit dynamic arguments which provide details that can uniquely identify a finding, which is much more helpful in the IDE for browsing multiple results (as opposed to seeing that same static string repeated for each logically unique problem). As an example, for a string like ‘TokenA not in ModuleNameB module layout, we would create a message template like ‘{0} not in {1} module layout’ in the ‘rules’ table, and the corresponding message would store [“TokenA”, “ModuleNameB”] in the result args property.

(detail changes: for each plugin output, change the message property into 2: message_template, message_data; or add these 2 new properties and keep the existing message property. )

SARIF recommends that user-facing strings end in a period and also suggests that dynamic data is encapsulated in single quotes where that data refers to variables, etc. So this would be the resulting recommended string in the SARIF world: ‘TokenA’ not in ‘ModuleNameB’ layout. This standard again helps aggregating viewers to maintain some consistency in appearance for bundled results. The apostrophes help clarify to users the essential data for identifying the root cause of a specific result.

(detail changes: for each plugin output, change the message text so that it ends with "." and also add single quotes, and make the sentence start with Capital letter.)

chelnak · 2022-10-02T08:52:53Z

Added in #40

shaopeng-gh added the enhancement New feature or request label Apr 20, 2022

chelnak closed this as completed Oct 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding SARIF support #43

Adding SARIF support #43

shaopeng-gh commented Apr 20, 2022

chelnak commented Oct 2, 2022

Adding SARIF support #43

Adding SARIF support #43

Comments

shaopeng-gh commented Apr 20, 2022

Use Case

Additional Context

Describe the Solution You Would Like

chelnak commented Oct 2, 2022