-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1577: update and clean license list generation to return more SPDXID for more inputs #1691
Conversation
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Benchmark Test ResultsBenchmark results from the latest changes vs base branch
|
"apl1": "APL-1.0", | ||
"apl1.0": "APL-1.0", | ||
"apl1.0.0": "APL-1.0", | ||
"apps2.0.0p": "App-s2p", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic is now detecting versions to expand from the simplified string: We can talk through if we want to handle this case s2p
where 2 is not a version. I don't think it would be App-s3p
in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seem to be a couple of these cases which has caused the list to grow by 18 lines --- potentially 6 cases of a string that are now expanding version permutations which were not doing it before this PR
Signed-off-by: Christopher Phillips <[email protected]>
@@ -35,7 +33,7 @@ var licenseIDs = map[string]string{ | |||
} | |||
`)) | |||
|
|||
var versionMatch = regexp.MustCompile(`-([0-9]+)\.?([0-9]+)?\.?([0-9]+)?\.?`) | |||
var versionMatch = regexp.MustCompile(`([0-9]+)\.?([0-9]+)?\.?([0-9]+)?\.?`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Causing us to match on numbers in licenses that would otherwise not be considered versions, but I think the trade off of being able to match on more strings out in the wild and return a correct SPDX ID is a good one here.
// so we need to guarantee the order they are created to avoid mapping them wrongly. So we use a sorted list. | ||
// To overwrite deprecated licenses during the first pass we would later on rely on map order, | ||
// [which in go is not consistent by design](https://stackoverflow.com/a/55925880). | ||
// The order of variations/permutations of a license ID matter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified this logic down to a single pass after we sort the list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented out the log messages to reduce the noise since it's just a script we run to generate our license list and not part of the larger syft program.
You can turn those on to see how the sort prevents things like alpm1
from mapping to later versions
if l.Deprecated { | ||
return false | ||
} | ||
|
||
// We want to replace deprecated licenses with non-deprecated counterparts | ||
// For more information, see: https://github.com/spdx/license-list-XML/issues/1676 | ||
if other.Deprecated { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
findReplacementLicense
already assumes a deprecated input
return l.ID == other.ID | ||
} | ||
|
||
func (ll LicenseList) findReplacementLicense(deprecated License) *License { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved above the function canReplace
for readability
return "", false | ||
} | ||
|
||
func cleanLicenseID(id string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to duplicate this between generate and the actual package. If there is a better way to share the code happy to update to that!
true, | ||
}, | ||
// the below few cases are NOT expected, however, seem unavoidable given the current approach | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer returning true 🥳
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:chef-kiss:
Signed-off-by: Christopher Phillips <[email protected]>
Things to look at in the AM:
|
Signed-off-by: Christopher Phillips <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really nice work 🥇
Update the license_list.go to have more permissible inputs for greater SPDXID matching. EX: GPL3 gpl3 gpl-3 and GPL-3 can all map to GPL-3.0-only By moving all strings to lower and removing the "-" we're able to return valid SPDX license ID for a greater diversity of input strings. --------- Signed-off-by: Christopher Phillips <[email protected]>
Summary
Update the
license_list.go
to have more permissible inputs for greater SPDXID matching.The
spdxlicense
package contains theID
method which interacts with the generated filelicense_list.go
The current implementation contains
-
in the keys. This PR removes these-
and sanitizes the inputs so that we can match on a wider range of inputs found in the wild.spdxlicense.ID
has also been changed to only consider the generated list and return if a value existsThe logic of how this information is encoded has been temporarily been moved to the format helpers.
Note this is temporary. #1554 will be used to update our license parsing logic so that license creation is done at the same time as package creation.
In a follow up PR encoders or other middle layers of syft should no longer have any concerns surrounding updating/finding the correct SPDXID or Expression as this will be done when packages are created at the cataloger level.
Example:
GPL3
gpl3
gpl-3
andGPL-3
can all map toGPL-3.0-only
By moving all strings to lower and removing the - we're able to return valid SPDX license ID for a greater diversity of input strings.