-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oheger bosch/scanner/#3223 scanner options compatibility #3372
Oheger bosch/scanner/#3223 scanner options compatibility #3372
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only reviewed the first commit so far, I don't have time for the others today and will continue the review on Monday. So far I'm a bit overwhelmed by the complexity of this, but maybe that impression changes when I look at the other commits.
* that scan results are considered compatible only if they contain (at least) all the data requested by the | ||
* options provided in the exact form. This is suitable if a specific scanner is used and the processing of the scan | ||
* result is sensitive to the concrete data it contains. In lenient mode in contrast a user merely specifies in what | ||
* kind of data he or she is interested (such as license or copyright information), but does not care from which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use singular they: "in what kind of data he or she is interested" -> "what kind of data they are interested in"
See: https://en.wikipedia.org/wiki/Singular_they
* Return the enum value with the specified [name] or default to the (most restrictive) [STRINGS] type when | ||
* the name cannot be resolved. | ||
*/ | ||
fun forName(name: String): SubOptionType = mapping[name] ?: STRINGS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to use the existing valueOf
function and fail if the name does not exist? My assumption would be that if a wrong name is provided this is caused by a bug we don't want to hide.
* instance of this [optionClass] with equivalent [subOptions]. | ||
*/ | ||
override fun isCompatible(options: ScannerOptions, strict: Boolean): Boolean = | ||
options.checkPresenceOrStrict(optionClass, strict) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't you just use this::class
here instead of making this class generic and storing the optionClass
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you perhaps elaborate a bit? I am not so familiar with Kotlin's reflection capabilities. Is it then still possible to query specific options in a type-safe way?
Originally, I hoped to solve the access to specific sub option classes using reified types. But this was not working because sometimes dynamic classes need to be looked up.
packageOption, | ||
urlOption | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You partly go towards a DSL with the SubOptions.create
function, did you consider adding more DSL functions so that you could write this as:
scannerOptions {
outputFormatOption {
stringOption("xml")
}
copyrightResultOption {
stringOption(key = "consolidate", value = "true")
}
emailOption {
thresholdOption(100.0)
}
licenseOption {
stringOption(key = "consolidate", value = "true")
stringOption(key = "license-text", value = "true")
thresholdOption(key = "license-score", value = 75.0)
}
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking about adding more DSL-like features, but then I am not sure whether this is worth the effort. The main consumers of this DSL are concrete scanner implementations, and their number is limited. But I agree, this is looking very nice.
Complexity is indeed an issue. I think, it is partly inherent to the problem of assigning a meaning to scanner options and doing compatibility checks differently based on this meaning. I am open to all suggestions to make this more simple. |
@mnonnenmacher, any updates here or do you want me to fix the things you have discovered first? |
I'll contact you on Slack regarding that review. |
Storing the configuration of a scanner as a plain string is not sufficient to determine whether results contained in a ScanResultsStorage could be reused during a scan operation. Therefore, create a logic model of scanner options that adds semantic to single options and supports compatibility checks. This commit contains some fundamental infrastructure classes of this model and some classes representing important concrete scanner options. Further option classes will be added. Signed-off-by: Oliver Heger <[email protected]>
Add two new classes to represent inclusion and exclusion filter options. These options implement a special comparison logic for compatibility checks in lenient mode. Signed-off-by: Oliver Heger <[email protected]>
TimeoutOption deals with timeouts configured for a scanner. In lenient mode, a scanner result is considered compatible if the configured timeout is greater or equal the timeout requested. UnclassifiedOption collects all other options for which no special representation exists, but that affect the scanner result. Signed-off-by: Oliver Heger <[email protected]>
Add a variant of this method that allows configuring the options for which non-strict checks should be made. This gives very fine control how to deal with the compatibility of different scanner options. Signed-off-by: Oliver Heger <[email protected]>
Map the command line options supported by ScanCode to the generic model of scanner options. So far a ScanCode command line as defined in the configuration can be processed. Signed-off-by: Oliver Heger <[email protected]>
Add a new parseScannerOptionsFromResult() function to extract the JSON representation of the ScanCode command line from a serialized result and convert it to a ScannerOptions object. Rename parseScannerOptions() to parseScannerOptionsFromCommandLine() to have a consistent naming scheme. Signed-off-by: Oliver Heger <[email protected]>
Support advanced compatibility checks based on the logic model of scanner configuration options. The options property is nullable to be backwards compatible. If it is undefined, checks can only be done against the (exact) configuration string. Signed-off-by: Oliver Heger <[email protected]>
Enhance the ScannerConfigMatcher predicate to expect a ScannerDetails parameter. That way, all the configuration-related properties of the details can be used by the check. Add a new compatibleConfigMatcher() function that does a compatibility check of the options stored in a ScannerDetails object against a given set of options. Using this matcher, it can be specified in a flexible way, which results from a results storage are acceptable for the current set of scanner options. Signed-off-by: Oliver Heger <[email protected]>
Concrete scanner implementations can now provide a ScannerOptions object with a logic representation of their current configuration. This object becomes part of the ScannerDetails and ScannerCriteria created by the scanner, and it is therefore used when querying results from a ScanResultsStorage. Extend the scanner configuration to support the definition of scanner options which can be compared in non-strict mode when doing compatibility checks. Signed-off-by: Oliver Heger <[email protected]>
Override the getScannerOptions() method to return a logic representation of the scanner options derived from the command line. These options can then be used when querying results from ScanResultStorages. Signed-off-by: Oliver Heger <[email protected]>
The original comment is no longer correct, as a full check of the criteria requested cannot be done in the database query. Reword it to reflect the current situation. Signed-off-by: Oliver Heger <[email protected]>
88c2b66
to
ea0e3a9
Compare
Is there something that can be done on the scancode side to help? |
Thanks for the offer, but I believe this is something we need to sort out ourselves. We need a way for ORT users to configure under which circumstances to regard scan results (from the same scanner, but maybe a different version and / or configuration) as compatible. |
@oheger-bosch I am closing this as I merged several simplifications in #7572 which are conflicting with this PR. |
This PR extends the compatibility checks on scanner results retrieved from a ScanResultsStorage by taking the scanner configuration options into account. The ideas are based on a meeting together with @mnonnenmacher and @sschuberth.
The goal is to check for the compatibility of scan results not based on the exact presence or absence of command line options, but taking the meaning of these options into account. For this purpose, a logic model of scanner options has been introduced. The model is derived from the CLI of ScanCode, but could be extended for other scanners supporting additional options and types of results.
The design target behind the model of scanner options was to make it powerful enough to support the required compatibility checks, but at the same time not making it too complex. Of course, it involves a bunch of classes to represent the different options.
Interestingly, there are no changes required at the ScanResultsStorage implementations to enable the advanced compatibility checks. This is due to the fact that the logic is encapsulated in the ScannerCriteria class, and the additional information related to scanner options is automatically serialized as part of the scan result. Note that the implementation is backwards compatible; if no information about scanner options is available in a result obtained from a storage, the compatibility checks fall back to the original comparison of the command line.