-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tern output is much better in text format than in SPDX format #1188
Comments
Agreed. Also, the dependencies in text format are grouped by layers but in spdx.json they are not. Can we also have them grouped by layer in spdx.json? |
Hi @vargenau -- thanks for opening this issue. I apologize for the delay, I have been on maternity leave. The conversion from the text licenses that Tern finds to SPDX identifiers has long been an issue (see discussions here and here). There is no library that reliably does this that we have found but we do attempt to do this using the license_expression library (added when this PR was merged). I think you are what inspired this PR, actually. This does seem like a bug, however, that the LicenseRefs are not listed for the bsdutils package licenses. I suspect it has to do with the fact that the package is a debian package which means that Tern cannot use the package manager to get clear license expressions (since debian does not provide them) and instead has to scan the copyright files within a package to attempt to get package licenses. Because of this we may have chosen not to represent these as @xxLiuxx -- I will look in to your request as well. |
@vargenau I looked in to this and this is a debian-based container specific problem. The licesnes are the same in text and SPDX documents for other package managers such as apk (Alpine) and rpm (Photon):
As I mentioned in my previous comment, Debian licenses are not declared by the package/package manager, which is unfortunate. Instead, Tern has to parse the copyright files and pulls out a list of license-looking text (we use debian-inspector for this) which is the long list of licenses you see. Because there is a list of licesnes from the debian copyright text it's difficult to put these into separate license identifiers using SPDX's PackageLicenseDeclared field. SPDX requires using AND or OR if you have more than one license, each with their own compliance implications that Tern cannot infer just from the list of extracted licenses (https://spdx.github.io/spdx-spec/v2.3/SPDX-license-expressions/#d4-composite-license-expressions) . Do you have any ideas on how Tern could improve this? |
I guess one option might be to create one single LicenseRef with all of the licenses found in the copyright texts? i.e.
I'm not sure if this is allowed in SPDX, though, I would need to email the text mailing list. |
Hi @rnjudge, Thank you for taking my ticket into account. However, I do not consider it the final solution. Syft does it better for this package PackageLicenseDeclared: BSD-2-Clause AND BSD-3-Clause AND BSD-4-Clause AND GPL-2.0-only AND GPL-2.0-or-later AND GPL-3.0-only AND GPL-3.0-or-later AND LGPL-2.0-only AND LGPL-2.0-or-later AND LGPL-2.1-only AND LGPL-2.1-or-later AND LGPL-3.0-only AND LGPL-3.0-or-later AND MIT (but it misses They do it by maintaining an equivalence table: https://github.com/anchore/syft/blob/main/internal/spdxlicense/license_list.go For example, they map Would something like that be doable in Tern? |
Hi @vargenau. I would argue that the Syft SBOM is not entirely correct and not exactly what Tern should strive to match. For starters, they report that the Second, they are using the conjunctive That being said, the equivalence table that Syft uses to map license identifiers to raw license text is an interesting concept. I suspect this is similar to what the license-expression library does, though (which Tern utilizes to resolve license text to identifiers), and if we can avoid manual upkeep of a license list that would be ideal. Lastly, I am in correspondance with Philippe, the maintainer of Scancode, and he says that we can utilize scancode to detect and make sense of Debian copyright files whether they are structured machine-readable files or legacy non-structured. This may take some time, so in the meantime I will make the change discussed above as a temporary workaround. Does that work for you? |
Tern must scan the copyright files to gather any type of license information for Debian packages and uses the debian-inspector[1] library to do this. Once scanned, Debian licenses found are stored in the `pkg_licenses` field in the package data model (vs `pkg_license` field for packages found using the package manager). This was causing them not to be reported in SPDX documents. This commit enables Tern to report `pkg_licenses` found in Debian packages as `LicenseRefs` for both tag value and json SPDX formats. Resolves tern-tools#1188 [1] https://github.com/nexB/debian-inspector Signed-off-by: Rose Judge <[email protected]>
Hi @rnjudge I agree on the fact that the tools should only fill I have seen the email from Philippe on the SPDX mailing lists. It will be very good if he helps you to implement a better solution based on ScanCode. In the meantime you can merge your changes for your proposed solution with LicenseRef- containing the multiple licenses. |
Tern must scan the copyright files to gather any type of license information for Debian packages and uses the debian-inspector[1] library to do this. Once scanned, Debian licenses found are stored in the `pkg_licenses` field in the package data model (vs `pkg_license` field for packages found using the package manager). This was causing them not to be reported in SPDX documents. This commit enables Tern to report `pkg_licenses` found in Debian packages as `LicenseRefs` for both tag value and json SPDX formats. Resolves tern-tools#1188 [1] https://github.com/nexB/debian-inspector Signed-off-by: Rose Judge <[email protected]>
Tern must scan the copyright files to gather any type of license information for Debian packages and uses the debian-inspector[1] library to do this. Once scanned, Debian licenses found are stored in the `pkg_licenses` field in the package data model (vs `pkg_license` field for packages found using the package manager). This was causing them not to be reported in SPDX documents. This commit enables Tern to report `pkg_licenses` found in Debian packages as `LicenseRefs` for both tag value and json SPDX formats. Resolves tern-tools#1188 [1] https://github.com/nexB/debian-inspector Signed-off-by: Rose Judge <[email protected]>
Tern must scan the copyright files to gather any type of license information for Debian packages and uses the debian-inspector[1] library to do this. Once scanned, Debian licenses found are stored in the `pkg_licenses` field in the package data model (vs `pkg_license` field for packages found using the package manager). This was causing them not to be reported in SPDX documents. This commit enables Tern to report `pkg_licenses` found in Debian packages as `LicenseRefs` for both tag value and json SPDX formats. Resolves tern-tools#1188 [1] https://github.com/nexB/debian-inspector Signed-off-by: Rose Judge <[email protected]>
Tern must scan the copyright files to gather any type of license information for Debian packages and uses the debian-inspector[1] library to do this. Once scanned, Debian licenses found are stored in the `pkg_licenses` field in the package data model (vs `pkg_license` field for packages found using the package manager). This was causing them not to be reported in SPDX documents. This commit enables Tern to report `pkg_licenses` found in Debian packages as `LicenseRefs` for both tag value and json SPDX formats. Resolves tern-tools#1188 [1] https://github.com/nexB/debian-inspector Signed-off-by: Rose Judge <[email protected]>
Tern must scan the copyright files to gather any type of license information for Debian packages and uses the debian-inspector[1] library to do this. Once scanned, Debian licenses found are stored in the `pkg_licenses` field in the package data model (vs `pkg_license` field for packages found using the package manager). This was causing them not to be reported in SPDX documents. This commit enables Tern to report `pkg_licenses` found in Debian packages as `LicenseRefs` for both tag value and json SPDX formats. Resolves tern-tools#1188 [1] https://github.com/nexB/debian-inspector Signed-off-by: Rose Judge <[email protected]>
Tern must scan the copyright files to gather any type of license information for Debian packages and uses the debian-inspector[1] library to do this. Once scanned, Debian licenses found are stored in the `pkg_licenses` field in the package data model (vs `pkg_license` field for packages found using the package manager). This was causing them not to be reported in SPDX documents. This commit enables Tern to report `pkg_licenses` found in Debian packages as `LicenseRefs` for both tag value and json SPDX formats. Resolves #1188 [1] https://github.com/nexB/debian-inspector Signed-off-by: Rose Judge <[email protected]>
Describe the bug
Tern output is much better in text format than in SPDX format.
To Reproduce
Text output:
SPDX output:
Expected behavior
In the text format, we have a long list of licenses that have been correctly identified by Tern.
I would expect to have the same list in the SPDX output (after conversion to SPDX identifiers).
Environment you are running Tern on
The text was updated successfully, but these errors were encountered: