Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a utility to map originator defined license strings to SPDX license format #106

Closed
nishakm opened this issue May 21, 2019 · 14 comments
Closed

Comments

@nishakm
Copy link

nishakm commented May 21, 2019

Package managers or text files may declare a license string that may not be an SPDX license format. For example, many projects declare their license as BSD but it is unclear which BSD.
oss-review-toolkit has an implementation that comes reasonably close.

https://github.com/heremaps/oss-review-toolkit/blob/490c2da69435182db19c6dbdebad863de41a7108/spdx-utils/src/main/kotlin/SpdxLicenseAliasMapping.kt

https://github.com/heremaps/oss-review-toolkit/blob/518e17c0b1385cc403960cfdfdff69e76240cb27/spdx-utils/src/main/kotlin/SpdxDeclaredLicenseMapping.kt

This is written in kotlin but it looks reasonably straightforward to convert it to python2/3

This would be useful for tools that are reading user-declared licenses.

cc @tsteenbe @goneall @kestewart

@pombredanne
Copy link
Member

@nishakm Thank you! This is indeed useful but such a mapping would be most useful in the context of a certain package manifest and type and its actual raw license declaration data and IMHO in the context of license detection. As such the mappings from ORT are generic and would better if they were package-type specific (e.g. capturing the conventions of npms, Maven, etc).

But my main point is that I am not sure there is a place in this library where I could make use of this data. When you craft SPDX documents, you need to have already the proper normalized licenses ids and license expressions so there would not be a place to plug this in. This would have to be done/used before.

Therefore I think this is something that could be best used in two places:

  1. as a list of aliases when parsing license expressions with the license-expression(https://github.com/nexB/license-expression/) library. This is something that is explicitly supported there and it would be great to have a list of aliases. This could be a list of generic aliases alright

  2. as mappings used when you parse the declared licenses of package manifests in the scancode-toolkit. This probably would be another good place.

Feedback welcomed!

@pombredanne
Copy link
Member

@sschuberth btw, this mapping is not right: https://github.com/heremaps/oss-review-toolkit/blob/518e17c0b1385cc403960cfdfdff69e76240cb27/spdx-utils/src/main/kotlin/SpdxDeclaredLicenseMapping.kt#L120
"CDDL v1.0 / GPL v2 dual license" to (CDDL_1_0 and GPL_2_0_ONLY), should be instead an OR and not and AND IMHO

@sschuberth
Copy link
Member

Thanks @pombredanne for the hint, @mnonnenmacher any comments?

@nishakm
Copy link
Author

nishakm commented Jun 8, 2019

@nishakm Thank you! This is indeed useful but such a mapping would be most useful in the context of a certain package manifest and type and its actual raw license declaration data and IMHO in the context of license detection. As such the mappings from ORT are generic and would better if they were package-type specific (e.g. capturing the conventions of npms, Maven, etc).

Agreed. It would be a much larger undertaking then :)

But my main point is that I am not sure there is a place in this library where I could make use of this data. When you craft SPDX documents, you need to have already the proper normalized licenses ids and license expressions so there would not be a place to plug this in. This would have to be done/used before.
I was going to go off and create an independent python module for this purpose, but I was told by the SPDX folks that this repo would be a good central location for such a module.

Therefore I think this is something that could be best used in two places:

  1. as a list of aliases when parsing license expressions with the license-expression(https://github.com/nexB/license-expression/) library. This is something that is explicitly supported there and it would be great to have a list of aliases. This could be a list of generic aliases alright

The use case is basically translating what looks like the appropriate license declared somewhere in the artifact into the license expression. So I'm not sure how a license expression parser would help here.

  1. as mappings used when you parse the declared licenses of package manifests in the scancode-toolkit. This probably would be another good place.

I'd like it to be independent of scancode-toolkit because other projects who want the same thing can use it as well. But if there are already mappings in here, where in the project might I find it?

Feedback welcomed!

sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jun 28, 2019
Thanks to @pombredanne for pointing out at [1] that "/" usually refers
to dual licensing and as such expresses a license option.

[1] spdx/tools-python#106 (comment)

Signed-off-by: Sebastian Schuberth <[email protected]>
sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jun 28, 2019
Thanks to @pombredanne for pointing out at [1] that "/" usually refers
to dual licensing and as such expresses a license option.

[1] spdx/tools-python#106 (comment)

Signed-off-by: Sebastian Schuberth <[email protected]>
sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jun 28, 2019
Thanks to @pombredanne for pointing out at [1] that "/" usually refers
to dual licensing and as such expresses a license option.

[1] spdx/tools-python#106 (comment)

Signed-off-by: Sebastian Schuberth <[email protected]>
@sschuberth
Copy link
Member

@nishakm how about creating language-agnostic license mappings in JSON / YAML format that are initially populated with the existing mappings from ORT / the ones that nexB has, and put them in a "neutral" place like probably a new repository at https://github.com/spdx?

@nishakm
Copy link
Author

nishakm commented Sep 25, 2019

@nishakm how about creating language-agnostic license mappings in JSON / YAML format that are initially populated with the existing mappings from ORT / the ones that nexB has, and put them in a "neutral" place like probably a new repository at https://github.com/spdx?

This sounds good to me if @kestewart and @goneall are OK with creating an independent repo under the spdx github namespace. I am not a license geek but I already feel like a 1:1 mapping is not going to get all the way there. There needs to be some kind of string formatting or downstream processing from there. But a 1:1 mapping to start off would be great!

@pombredanne
Copy link
Member

@sschuberth re:

how about creating language-agnostic license mappings in JSON / YAML format that are initially populated with the existing mappings from ORT / the ones that nexB has, and put them in a "neutral" place like probably a new repository at https://github.com/spdx?

I was exactly talking about this yesterday about this very ticket of @nishakm

@nishakm re:

I am not a license geek but I already feel like a 1:1 mapping is not going to get all the way there. There needs to be some kind of string formatting or downstream processing from there. But a 1:1 mapping to start off would be great!

The right approach would be indeed not to have a 1:1 mapping but something imho which would be this way:
Given these:

  • a license string and or structured data snippet (to account for npm old styles and Maven structures) as found in a package manifest.
  • a package manager type (e.g. a Package URL type)
    Then we map to:
  • a license expression
  • some indication of confidence (say between 0 and 100) for the accuracy of this mapping
  • some optional notes

Let me create the repo :)

@pombredanne
Copy link
Member

@nishakm @sschuberth there it is: https://github.com/spdx/package-licenses-mapping
@nishakm I invited you there as a committer too.

@pombredanne
Copy link
Member

See spdx/package-licenses-mapping#1 which is the continuation for this ticket

@pombredanne
Copy link
Member

@nishakm one of the reason that having a separate repo is better is that it could be reused in many places beyond this Python tool repo.

@nishakm
Copy link
Author

nishakm commented Sep 25, 2019

It's what I asked for initially. It was suggested that I file an issue here :)

@pombredanne
Copy link
Member

@nishakm that's fine, you could have made it clear in the ticket

@pombredanne
Copy link
Member

@nishakm re:

The use case is basically translating what looks like the appropriate license declared somewhere in the artifact into the license expression. So I'm not sure how a license expression parser would help here.

This would be important to validate that the license expressions are correct and in a canonical form. That's a something to add as a test of sorts

@goneall
Copy link
Member

goneall commented Sep 28, 2019

Just catching up on the issue .

I recall discussing this on one of the SPDX calls and I do recall talking about adding the issue to the tools.

I also agree with the above comments that we should have a separate data mapping repo along with tools implementations that support the mapping. I don't recall the discussion precisely, but I don't think anyone had a concern with a neutral mapping repo - just a concern about adding it to the spec since the mapping may be updated quite frequently.

I like the idea of the mapping repo and can use this in the SPDX Java Tools as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants