Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected LicenseRefs #3974

Open
elrayle opened this issue Nov 6, 2024 · 10 comments
Open

unexpected LicenseRefs #3974

elrayle opened this issue Nov 6, 2024 · 10 comments

Comments

@elrayle
Copy link

elrayle commented Nov 6, 2024

ClearlyDefined added support for LicenseRefs. Scancode is the only source at the moment that produces LicenseRefs that are used. I'm seeing a few results that are unexpected. Can you provide information on the following LicenseRefs? (selected out a few, there may be others that are similar)

Not in the list of scancode-licensedb...

  • LicenseRef-LICENSE
  • LicenseRef-LICENSE.md
  • LicenseRef-Unspecified

In the list of scancode-licensedb, but appear to be catch alls...

  • LicenseRef-scancode-commercial-license
  • LicenseRef-scancode-free-unknown
  • LicenseRef-scancode-generic-export-compliance
  • LicenseRef-scancode-generic-cla
  • LicenseRef-scancode-proprietary-license
  • LicenseRef-scancode-unknown-license-reference (this appears a lot)
  • LicenseRef-scancode-unknown
@pombredanne
Copy link
Member

These two are not from ScanCode, as we always use a "LicenseRef-scancode" prefix, but these are aliases found in the wild that we listed here: https://scancode-licensedb.aboutcode.org/proprietary-license.html but we should not report these as SPDX licenses on our side. Else this is a bug.

  • LicenseRef-LICENSE
  • LicenseRef-LICENSE.md

Do know which file they were detected in exactly?

This one is weird:

These are "generic" licenses with "is_generic" flag set to true:

  • LicenseRef-scancode-commercial-license
  • LicenseRef-scancode-free-unknown
  • LicenseRef-scancode-generic-export-compliance
  • LicenseRef-scancode-generic-cla
  • LicenseRef-scancode-proprietary-license
  • LicenseRef-scancode-unknown-license-reference (this appears a lot)
  • LicenseRef-scancode-unknown
  1. They are detected using various rules and you always want to use the --license-text option to get the exact matched license or notice text. (This is always a good thing to use in all cases)

  2. unknown-license-reference are common and many of them are recombined in the top level "license_detections" results, a feature recently added

For instance, say we have these fictitious license rules:

  • a. This is licensed under as an "unknown-license-reference"
  • b. The GPL 2 as a gpl-2.0
  • c. The MIT license as a mit

With the license detection recombination, a. followed by b. will be reported only as gpl-2.0, and same for a. then c as mit.

This means that 1. you should use the --license-text option to collect the matched text and 2. you need to use the top level detections and not only the lower level license matches

@elrayle
Copy link
Author

elrayle commented Nov 7, 2024

Thanks for all that info. I am including a single package-version for each.

Not in scancode db

license CD coordinates
LicenseRef-LICENSE npm/npmjs/-/arrow-orm/0.2.72
LicenseRef-LICENSE.md git/github/strongloop/strong-executor/209f0de764ca072008f18b414a81becbef3957f9
LicenseRef-Unspecified git/github/dlaidig/qmt/3a90d19c2fdea0b4579fb8808225bcb9862fc3ae
LicenseRef-Rakuten-Group-Proprietary-License maven/mavencentral/io.github.ec-mobile.rex/icons-compose-android/1.0

In scancode db

license CD coordinates
LicenseRef-scancode-commercial-license maven/mavencentral/com.vaadin/vaadin-upload-flow/14.12.1
LicenseRef-scancode-free-unknown crate/cratesio/-/bitwarden-core/1.0.0
LicenseRef-scancode-generic-export-compliance maven/mavencentral/org.eclipse.persistence/org.eclipse.persistence.asm/9.7.1
LicenseRef-scancode-generic-cla go/golang/github.com%2fazure%2fazure-sdk-for-go%2fsdk/azcore/v1.16.0
LicenseRef-scancode-proprietary-license npm/npmjs/-/apexcharts/3.54.1
LicenseRef-scancode-unknown-license-reference maven/mavencentral/com.melloware/commons-beanutils2/2.0.0
LicenseRef-scancode-unknown maven/mavencentral/com.melloware/commons-beanutils2/2.0.0

@elrayle
Copy link
Author

elrayle commented Nov 7, 2024

FYI... we had data from before the update. I did an analysis on the pre-2.0 data. It already includes LicenseRefs. These are the stats from that analysis.

  • 44 unique LicenseRefs
  • 8,539 package-versions that include LicenseRefs (0.02% of all package-versions)
  • 41,927,085 package-versions

List of LicenseRefs in pre-v2.0 data sorted by the number of packages (ignoring versions) that they appear in:

Image

@pombredanne
Copy link
Member

@elrayle Thanks!

re:

"List of LicenseRefs in pre-v2.0 data sorted by the number of packages (ignoring versions) that they appear in:"

Do you mind to attach a text file?

(Tesseract is not too shabby at reading PNGs, but I would prefer the raw text)

Also do you have one example where each of these show up?

Tesseract's OCR output:

Unique LicenseRef only Count of packages
LicenseRef-LICENSE 200
LicenseRef-NetCommons 112
LicenseRef-Slint-commercial 25

LicenseRef-LICENSE.md
LicenseRef-.amazon.com.-AmznSL-1.0
LicenseRef-jsoncpp-public-domain
LicenseRef-PdfiumThirdParty
LicenseRef-qskinny
LicenseRef-Rakuten-Group-Proprietary-License
LicenseRef-Qt-Commercial
LicenseRef-BSD-3-Clause-CMU
LicenseRef-fitsio

LicenseRef-HDF5
LicenseRef-JSONinJSPublicDomain
LicenseRef-MIT-Bootstrap
LicenseRef-mit-dmic
LicenseRef-MIT-like
LicenseRef-OpenEvidence
LicenseRef-PIL
LicenseRef-Proprietary
LicenseRef-Proprietaryintel
LicenseRef-PSF-based
LicenseRef-scancode-other-copyleft
LicenseRef-SHA1-Public-Domain
LicenseRef-SixtyFPS-commercial
LicenseRef-tzdata-PublicDomain
LicenseRef-Automake-exception-2.0
LicenseRef-Chef-EULA
LicenseRef-Custom
LicenseRef-EPL-Steward
LicenseRef-KhronosFreeUse
LicenseRef -LICENCE
LicenseRef-LICENSE. txt
LicenseRef-NextcloudTrademarks
LicenseRef-old-glib-tests
LicenseRef-PNGSuite
LicenseRef-ProprietaryMicrosoft
LicenseRef-Public-Domain
LicenseRef-PUBLIC-DOMAIN-xi2-xy
LicenseRef-tomb.v1
LicenseRef-UFL-1.0
LicenseRef-unDraw
LicenseRef-Unspecified
LicenseRef-yarn

TOTAL packages with a LicensoRef
(count does not include versions)
``  

@elrayle
Copy link
Author

elrayle commented Nov 7, 2024

Unique LicenseRef only Count of packages CD Coordinates
LicenseRef-LICENSE 200 npm/npmjs/-/arrow-orm/0.2.72
LicenseRef-NetCommons 112 composer/packagist/netcommons/access-counters
LicenseRef-Slint-commercial 25 crate/cratesio/-/vtable
LicenseRef-LICENSE.md 12 git/github/strongloop/strong-executor/209f0de764ca072008f18b414a81becbef3957f9
LicenseRef-.amazon.com.-AmznSL-1.0 7 npm/npmjs/@alexa-games/sfb-cli
LicenseRef-jsoncpp-public-domain 4 git/github/khronosgroup/openxr-sdk
LicenseRef-PdfiumThirdParty 4 pypi/pypi/-/pypdfium2
LicenseRef-qskinny 4 /LicenseRef-Automake-exception-2.0/LicenseRef-HDF5/3
LicenseRef-Rakuten-Group-Proprietary-License 4 maven/mavencentral/io.github.ec-mobile.rex/icons-compose-android/1.0
LicenseRef-Qt-Commercial 3 git/github/qtproject/pyside-pyside-setup
LicenseRef-BSD-3-Clause-CMU 2 pypi/pypi/-/benchexec
LicenseRef-fitsio 2 conda/conda-forge/linux-64/cfitsio
LicenseRef-HDF5 2 conda/conda-forge/linux-64/hdf5
LicenseRef-JSONinJSPublicDomain 2 git/github/sap/openui5
LicenseRef-MIT-Bootstrap 2 git/github/liferay/clay
LicenseRef-mit-drnic 2 git/github/sap/cloud-authorization-buildpack
LicenseRef-MIT-like 2 git/github/fosslight/fosslight_source_scanner
LicenseRef-OpenEvidence 2 git/github/curl/curl
LicenseRef-PIL 2 conda/conda-forge/linux-64/pillow
LicenseRef-Proprietary 2 git/github/com-posers-pit/smw_music
LicenseRef-ProprietaryIntel 2 conda/conda-forge/linux-64/mkl
LicenseRef-PSF-based 2 conda/conda-forge/win-64/matplotlib-base
LicenseRef-scancode-other-copyleft 2 pypi/pypi/-/scancode-toolkit-mini
LicenseRef-SHA1-Public-Domain 2 git/github/qt/qtbase
LicenseRef-SixtyFPS-commercial 2 crate/cratesio/-/vtable
LicenseRef-tzdata-PublicDomain 2 git/github/sap/openui5
LicenseRef-Automake-exception-2.0 1 git/github/isc-projects/bind9
LicenseRef-Chef-EULA 1 gem/rubygems/-/inspec-core
LicenseRef-Custom 1 pypi/pypi/-/salientsdk
LicenseRef-EPL-Steward 1 git/github/graphs4value/refinery
LicenseRef-KhronosFreeUse 1 git/github/khronosgroup/spirv-cross
LicenseRef-LICENCE 1 npm/npmjs/-/formally
LicenseRef-LICENSE.txt 1 npm/npmjs/-/physiojs
LicenseRef-NextcloudTrademarks 1 git/github/nextcloud/android
LicenseRef-old-glib-tests 1 git/github/gnome/glib
LicenseRef-PNGSuite 1 git/github/khronosgroup/ktx-software
LicenseRef-ProprietaryMicrosoft 1 conda/conda-forge/win-64/ucrt
LicenseRef-Public-Domain 1 conda/conda-forge/noarch/tzdata
LicenseRef-PUBLIC-DOMAIN-xi2-xy 1 git/github/gardener/gardener-extension-shoot-networking-filter
LicenseRef-tomb.v1 1 git/github/sap/cloud-authorization-buildpack
LicenseRef-UFL-1.0 1 crate/cratesio/-/epaint
LicenseRef-unDraw 1 git/github/pistacheio/pistache
LicenseRef-Unspecified 1 git/github/dlaidig/qmt
LicenseRef-yarn 1 git/github/hedgedoc/html-to-react
     
TOTAL packages with a LicenseRef 425  (count does not include versions)

@elrayle
Copy link
Author

elrayle commented Nov 7, 2024

@dangoor Found this related issue from 2022.

@elrayle
Copy link
Author

elrayle commented Nov 7, 2024

This is the results comparing the OLD and the NEW. I can look at adding coordinates when I get a chance.

  NEW OLD
Unique LicenseRef only Count of packages Count of packages
LicenseRef-scancode-unknown-license-reference 274  
LicenseRef-LICENSE 200 200
LicenseRef-NetCommons 112 112
LicenseRef-scancode-generic-cla 69  
LicenseRef-scancode-proprietary-license 42  
LicenseRef-scancode-commercial-license 29  
LicenseRef-scancode-public-domain 26  
LicenseRef-Slint-commercial 25 25
LicenseRef-scancode-other-permissive 24  
LicenseRef-scancode-unknown 19  
LicenseRef-scancode-warranty-disclaimer 18  
LicenseRef-LICENSE.md 12 12
LicenseRef-Slint-Software-3.0 12  
LicenseRef-scancode-free-unknown 11  
LicenseRef-.amazon.com.-AmznSL-1.0 7 7
LicenseRef-scancode-protobuf 7  
LicenseRef-scancode-unicode-mappings 7  
LicenseRef-scancode-generic-export-compliance 6  
LicenseRef-qskinny 5 4
LicenseRef-jsoncpp-public-domain 4 4
LicenseRef-Rakuten-Group-Proprietary-License 4 4
LicenseRef-scancode-dco-1.1 4  
LicenseRef-scancode-ms-net-library-2018-11 4  
LicenseRef-scancode-other-copyleft 4 2
LicenseRef-scancode-public-domain-disclaimer 4  
LicenseRef-PdfiumThirdParty 3 4
LicenseRef-Qt-Commercial 3 3
LicenseRef-scancode-ms-edge-devtools-2022 3  
LicenseRef-scancode-paypal-sdk-2013-2016 3  
LicenseRef-scancode-unknown-spdx 3  
LicenseRef-scancode-w3c-docs-20021231 3  
LicenseRef-BSD-3-Clause-CMU 2 2
LicenseRef-fitsio 2 2
LicenseRef-HDF5 2 2
LicenseRef-JSONinJSPublicDomain 2 2
LicenseRef-LICENSE.txt 2 1
LicenseRef-MIT-Bootstrap 2 2
LicenseRef-mit-drnic 2 2
LicenseRef-MIT-like 2 2
LicenseRef-NextcloudTrademarks 2 1
LicenseRef-OpenEvidence 2 2
LicenseRef-PIL 2 2
LicenseRef-Proprietary 2 2
LicenseRef-ProprietaryIntel 2 2
LicenseRef-PSF-based 2 2
LicenseRef-scancode-mit-old-style 2  
LicenseRef-scancode-sunsoft 2  
LicenseRef-scancode-unicode 2  
LicenseRef-SHA1-Public-Domain 2 2
LicenseRef-SixtyFPS-commercial 2 2
LicenseRef-tzdata-PublicDomain 2 2
LicenseRef-UFL-1.0 2 1
LicenseRef-Automake-exception-2.0 1 1
LicenseRef-Chef-EULA 1 1
LicenseRef-Custom 1 1
LicenseRef-EPL-Steward 1 1
LicenseRef-KhronosFreeUse 1 1
LicenseRef-LICENCE 1 1
LicenseRef-old-glib-tests 1 1
LicenseRef-PNGSuite 1 1
LicenseRef-ProprietaryMicrosoft 1 1
LicenseRef-Public-Domain 1 1
LicenseRef-PUBLIC-DOMAIN-xi2-xy 1 1
LicenseRef-scancode-bsd-new-tcpdump 1  
LicenseRef-scancode-eclipse-sua-2014 1  
LicenseRef-scancode-facebook-patent-rights-2 1  
LicenseRef-scancode-facebook-software-license 1  
LicenseRef-scancode-fair-source-0.9 1  
LicenseRef-scancode-generic-trademark 1  
LicenseRef-scancode-ietf-trust 1  
LicenseRef-scancode-info-zip-2005-02 1  
LicenseRef-scancode-linking-exception-lgpl-2.0plus 1  
LicenseRef-scancode-microchip-products-2018 1  
LicenseRef-scancode-ms-azure-spatialanchors-2.9.0 1  
LicenseRef-scancode-ms-dxsdk-d3dx-9.29.952.3 1  
LicenseRef-scancode-ms-net-library 1  
LicenseRef-scancode-ms-patent-promise 1  
LicenseRef-scancode-mulanpsl-2.0-en 1  
LicenseRef-scancode-northwoods-sla-2021 1  
LicenseRef-scancode-python-cwi 1  
LicenseRef-scancode-secret-labs-2011 1  
LicenseRef-scancode-sun-sissl-1.0 1  
LicenseRef-scancode-us-govt-public-domain 1  
LicenseRef-scancode-vhfpl-1.1 1  
LicenseRef-tomb.v1 1 1
LicenseRef-unDraw 1 1
LicenseRef-Unspecified 1 1
LicenseRef-yarn 1 1
     
TOTAL packages with a LicenseRef 1025 425
(count does not include versions)    

@elrayle
Copy link
Author

elrayle commented Nov 8, 2024

@pombredanne If there are questions about licenses in ScanCode LicenseDB, is there a preferred place for the questions to be asked? I am writing a blob post announcing the support of LicenseRefs and want to include a statement like...

If you have comments on the actual LicenseRefs, you should reach out to ScanCode License DB maintainers.

@pombredanne
Copy link
Member

@elrayle re:

I am writing a blob post announcing the support of LicenseRefs and want to include a statement like...

Awesome 🙇 ... Please also link it here when done so we can relay and amplify!

but I would say instead:

If you have comments on the actual LicenseRefs, you should reach out to ScanCode Toolkit maintainers of the License DB.

The license DB is entirely generated from ScanCode toolkit licenses for now, so here is the place to report and discuss these issues. At some point of time, we could either extract the license DB in its own repo or publish it also as it its solo package, but I am not sure of the benefits?

@AyanSinhaMahapatra @DennisClark ping, what do you think?

@pombredanne
Copy link
Member

quick side note: some (or many?) of these licenseref exists in the wild. See for instance emilk/egui#5361

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants