-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter for scanner configuration in ScanResult storages is too restrictive #3223
Comments
Also strongly related to #3247. |
This issue is addressed by a PR, but as reviewers have correctly remarked, the proposed solution becomes pretty complex. When we try to model the various options supported by ScanCode (or other scanners) and different logic how to check their compatibility, an inherent complexity probably cannot be avoided. Therefore, I was thinking about alternative approaches. The current implementation of ScanCode already distinguishes between command line options that have an impact on the scanner output and others that do not. One possibility could be to extend this mechanism, so that the user is responsible of defining the compatibility of scanner options, but also has full control over it. The idea is that the scanner options in the ORT configuration could support properties to specify which scanner options must be set (or must not be present) to consider a scan result as compatible. In
Using this configuration, a scan result would be accepted from a storage if it was produced with the command line options "--copyright --package --strip-root", but not with the option "--shallow". (All other options would be ignored.) Compared to the complexity of the mentioned PR, such checks would be rather simple, but with some labour put into the configuration, the desired effects could be achieved. @sschuberth, @mnonnenmacher @fviernau, what do you think? |
@sschuberth, @mnonnenmacher, @fviernau: Any comments here? |
I like the idea of it being fully user configurable which options break scanner compatibility and which not. I'd only propose a slightly simpler configuration syntax, like
The However, as always the devil is in the details. Suppose you'd want to ignore any |
Good idea to specify only the options to disregard/ignore, this is easier. Regarding the matching of specific values: I have already some code that breaks a scanner command line into key-value pairs. So it would be possible to test against specific option values. The semantic could be: If the ignore expression is a simple string, we interpret this as option name and ignore the corresponding option. If it has the form Having to choose between |
Could you give a concrete example? How would the syntax look like for ignoring
Probably. I just somehow wanted to distinguish from ScanCode's |
As the disregard/ignore field is a list, we could simply repeat patterns for specific options. So for your example, the configuration could look like:
To ignore arbitrary patterns passed to ScanCode's
While thinking about this, this is a quite safe approach, in the sense that unexpected options cause a result to be considered incompatible. To support the creation of such configurations, we probably need to generate some log output why a result from a scan storage could not be reused. |
My question was specifically about how this would look like if you don't know (and don't care) what the concrete numbers for I believe I'd prefer something that more explicitly matches the options, like
This would ignore the Two other extremes to think about would be to make the comparison fully programmable by running a Kotlin script, similar to Evaluator rules, that get the |
Too many options :-(. I think the last extreme - dropping the compatibility check - requires that the user has full control over the scan storage. This is not the case when integrating external services like ClearlyDefined. Making the check scriptable would be the most flexible option. But maybe we should then provide a default script implementing a meaningful default behaviour and allow the user to override this in the configuration? |
So we still have not yet decided how to proceed here. My proposal would be to go for the last ignore configuration you suggested and skip the extreme cases for now. This would move us forward, and additional flexibility could be dealt with later if it was actually needed. Would this be okay? |
Just noticed a problem with the configuration of the options to ignore: Currently, I therefore propose to change this part of the scanner configuration and introduce a new |
That indeed sounds like a good first / non-breaking change into the right direction to me. |
@oheger-bosch - FYI: I just proposed a change in the same are - thus there is a bit potential of a slight conflict towards that "--ignore" option, see #3510. |
The ScannerDetails class has a configuration property of type String that corresponds to the command line options used by the scanner. When looking up scan results from a storage this String is currently compared using an exact match, to find out whether a result from the storage is applicable.
While this may work in a pure local setup, it is too restrictive when dealing with external storages like ClearlyDefined or other services. It is very unlikely that there is an exact match of the command line options in use. But even in a more controlled environment, the current mechanism is problematic: You just have to change the order of parameters, or modify parameters that do not affect the scanner's output (such as a debug flag or changing the parallelism), and a stored result will not be reused.
Therefore, the filtering for the scanner configuration should be improved. The new implementation should, however, be backwards compatible if possible; already existing and populated storages should continue working.
The text was updated successfully, but these errors were encountered: