Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oheger bosch/scanner/#3223 scanner options compatibility #3372

Conversation

oheger-bosch
Copy link
Member

This PR extends the compatibility checks on scanner results retrieved from a ScanResultsStorage by taking the scanner configuration options into account. The ideas are based on a meeting together with @mnonnenmacher and @sschuberth.

The goal is to check for the compatibility of scan results not based on the exact presence or absence of command line options, but taking the meaning of these options into account. For this purpose, a logic model of scanner options has been introduced. The model is derived from the CLI of ScanCode, but could be extended for other scanners supporting additional options and types of results.

The design target behind the model of scanner options was to make it powerful enough to support the required compatibility checks, but at the same time not making it too complex. Of course, it involves a bunch of classes to represent the different options.

Interestingly, there are no changes required at the ScanResultsStorage implementations to enable the advanced compatibility checks. This is due to the fact that the logic is encapsulated in the ScannerCriteria class, and the additional information related to scanner options is automatically serialized as part of the scan result. Note that the implementation is backwards compatible; if no information about scanner options is available in a result obtained from a storage, the compatibility checks fall back to the original comparison of the command line.

Copy link
Member

@mnonnenmacher mnonnenmacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only reviewed the first commit so far, I don't have time for the others today and will continue the review on Monday. So far I'm a bit overwhelmed by the complexity of this, but maybe that impression changes when I look at the other commits.

* that scan results are considered compatible only if they contain (at least) all the data requested by the
* options provided in the exact form. This is suitable if a specific scanner is used and the processing of the scan
* result is sensitive to the concrete data it contains. In lenient mode in contrast a user merely specifies in what
* kind of data he or she is interested (such as license or copyright information), but does not care from which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use singular they: "in what kind of data he or she is interested" -> "what kind of data they are interested in"
See: https://en.wikipedia.org/wiki/Singular_they

* Return the enum value with the specified [name] or default to the (most restrictive) [STRINGS] type when
* the name cannot be resolved.
*/
fun forName(name: String): SubOptionType = mapping[name] ?: STRINGS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to use the existing valueOf function and fail if the name does not exist? My assumption would be that if a wrong name is provided this is caused by a bug we don't want to hide.

* instance of this [optionClass] with equivalent [subOptions].
*/
override fun isCompatible(options: ScannerOptions, strict: Boolean): Boolean =
options.checkPresenceOrStrict(optionClass, strict) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you just use this::class here instead of making this class generic and storing the optionClass?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you perhaps elaborate a bit? I am not so familiar with Kotlin's reflection capabilities. Is it then still possible to query specific options in a type-safe way?
Originally, I hoped to solve the access to specific sub option classes using reified types. But this was not working because sometimes dynamic classes need to be looked up.

packageOption,
urlOption
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You partly go towards a DSL with the SubOptions.create function, did you consider adding more DSL functions so that you could write this as:

scannerOptions {
  outputFormatOption {
    stringOption("xml")
  }

  copyrightResultOption {
    stringOption(key = "consolidate", value = "true")
  }

  emailOption {
    thresholdOption(100.0)
  }

  licenseOption {
    stringOption(key = "consolidate", value = "true")
    stringOption(key = "license-text", value = "true")
    thresholdOption(key = "license-score", value = 75.0)
  }

  ...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking about adding more DSL-like features, but then I am not sure whether this is worth the effort. The main consumers of this DSL are concrete scanner implementations, and their number is limited. But I agree, this is looking very nice.

@oheger-bosch
Copy link
Member Author

Complexity is indeed an issue. I think, it is partly inherent to the problem of assigning a meaning to scanner options and doing compatibility checks differently based on this meaning. I am open to all suggestions to make this more simple.

@oheger-bosch
Copy link
Member Author

@mnonnenmacher, any updates here or do you want me to fix the things you have discovered first?

@mnonnenmacher
Copy link
Member

@mnonnenmacher, any updates here or do you want me to fix the things you have discovered first?

I'll contact you on Slack regarding that review.

Storing the configuration of a scanner as a plain string is not
sufficient to determine whether results contained in a
ScanResultsStorage could be reused during a scan operation. Therefore,
create a logic model of scanner options that adds semantic to single
options and supports compatibility checks.

This commit contains some fundamental infrastructure classes of this
model and some classes representing important concrete scanner
options. Further option classes will be added.

Signed-off-by: Oliver Heger <[email protected]>
Add two new classes to represent inclusion and exclusion filter
options. These options implement a special comparison logic for
compatibility checks in lenient mode.

Signed-off-by: Oliver Heger <[email protected]>
TimeoutOption deals with timeouts configured for a scanner. In lenient
mode, a scanner result is considered compatible if the configured
timeout is greater or equal the timeout requested.

UnclassifiedOption collects all other options for which no special
representation exists, but that affect the scanner result.

Signed-off-by: Oliver Heger <[email protected]>
Add a variant of this method that allows configuring the options for
which non-strict checks should be made. This gives very fine control
how to deal with the compatibility of different scanner options.

Signed-off-by: Oliver Heger <[email protected]>
Map the command line options supported by ScanCode to the generic
model of scanner options. So far a ScanCode command line as defined in
the configuration can be processed.

Signed-off-by: Oliver Heger <[email protected]>
Add a new parseScannerOptionsFromResult() function to extract the JSON
representation of the ScanCode command line from a serialized result
and convert it to a ScannerOptions object.

Rename parseScannerOptions() to parseScannerOptionsFromCommandLine()
to have a consistent naming scheme.

Signed-off-by: Oliver Heger <[email protected]>
Support advanced compatibility checks based on the logic model of
scanner configuration options.

The options property is nullable to be backwards compatible. If it is
undefined, checks can only be done against the (exact) configuration
string.

Signed-off-by: Oliver Heger <[email protected]>
Enhance the ScannerConfigMatcher predicate to expect a ScannerDetails
parameter. That way, all the configuration-related properties of the
details can be used by the check.

Add a new compatibleConfigMatcher() function that does a compatibility
check of the options stored in a ScannerDetails object against a given
set of options. Using this matcher, it can be specified in a flexible
way, which results from a results storage are acceptable for the
current set of scanner options.

Signed-off-by: Oliver Heger <[email protected]>
Concrete scanner implementations can now provide a ScannerOptions
object with a logic representation of their current configuration.
This object becomes part of the ScannerDetails and ScannerCriteria
created by the scanner, and it is therefore used when querying results
from a ScanResultsStorage.

Extend the scanner configuration to support the definition of scanner
options which can be compared in non-strict mode when doing
compatibility checks.

Signed-off-by: Oliver Heger <[email protected]>
Override the getScannerOptions() method to return a logic
representation of the scanner options derived from the command line.
These options can then be used when querying results from
ScanResultStorages.

Signed-off-by: Oliver Heger <[email protected]>
The original comment is no longer correct, as a full check of the
criteria requested cannot be done in the database query. Reword it to
reflect the current situation.

Signed-off-by: Oliver Heger <[email protected]>
@pombredanne
Copy link
Contributor

Is there something that can be done on the scancode side to help?
Note that I personally find checking actual CLI options not as robust as checking the presence of certain data attributes in scanc results.

@sschuberth
Copy link
Member

Is there something that can be done on the scancode side to help?

Thanks for the offer, but I believe this is something we need to sort out ourselves. We need a way for ORT users to configure under which circumstances to regard scan results (from the same scanner, but maybe a different version and / or configuration) as compatible.

@mnonnenmacher
Copy link
Member

@oheger-bosch I am closing this as I merged several simplifications in #7572 which are conflicting with this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants