-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: list all package licenses #10852
base: dev
Are you sure you want to change the base?
Conversation
Thank you so much for the contribution to NuGet! We'll leave this proposal open for the next couple weeks to get feedback from the .NET community & respective teams & we'll do a quick internal review after! If you're seeing this message, please 👍 or provide your feedback on this proposal in this PR as to why we should or shouldn't do this. Thank you everyone! |
proposed/2021/LicenseInspection.md
Outdated
- If a `licenseUrl` is provided, attempt to download the license from the endpoint and compare with the known list | ||
- Fallback to looking at the package feed and see if it provides license information | ||
|
||
It may also be worth having the facility to cache the SPDX license information from https://spdx.org/licenses/licenses.json to improve lookup performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nuget.org does not let licenses not listed in NuGet.Packaging (data) to be used in license expressions for the package, so the data is already there, but may be stale in the sense of OSI/FSF approval.
proposed/2021/LicenseInspection.md
Outdated
|
||
The first technical challenge for this is the inconsistent nature of which licenses are provided by NuGet packages. While the [`licenseUrl` field was deprecated](https://github.com/NuGet/Announcements/issues/32), some projects haven't adopted the new format (or older packages that predate the deprecation are in use), making it difficult to determine what the license of a project is. | ||
|
||
The next challenge is how to detect licenses from license files. The ideal approach would be to mirror GitHub's approach, which uses [Licensee](https://licensee.github.io/licensee/) [for detection](https://help.github.com/en/articles/licensing-a-repository#detecting-a-license) (but naturally a dotnet implementation). Essentially, this uses [Sørensen–Dice coefficient](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient) with a threshold for what is the acceptable level of comparison between the package's license file and license template. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if package supplies license file or uses license URL, but its text does not match any license?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delice
, the tool that this proposal is based off, will mark it as Unable to determine: https://github.com/aaronpowell/dotnet-delice/blob/main/src/DotNetDelice/ConsoleOutput.fs#L60
Hey @aaronpowell, We took some time this afternoon to review your proposal. Overall, we really like the direction & problems this proposal seeks to solve. There are a few items that we believe should be considered to take on this proposal. I'll go through a few of those just for the sake of transparency.
For now we're going to leave this proposal open to continue to iterate & gather more feedback from you and the community. Thanks again for the proposal and we invite anyone reading this comment to provide their feedback on this proposal in addition! |
Hi @JonDouglas, Thanks for the feedback and notes on the proposal. It's the first time I've submitted a proposal so I am very welcoming of feedback 😁. Here's my comments on the items you've raised:
I do think it could make a useful command within the
To a degree, yes it would be doing something that's already possible with third-party tooling. After all, I've written my own tooling to do this, which is the basis for the spec. Where I see this differs to existing tools, such as snyk's offering, is that this is a building block command. It doesn't go to the level of telling you if you are compliant or not, it just gives you enough information that you can make decisions around your compliancy, or even just give you an insight into the licenses you are consuming for transparency. The command is unopinionated, and I'd argue it should stay that way, so that users of it can make their own opinions based off the information they are provided. Additionally, with the increase of reliance on third party dependencies, I feel that it's the responsibility of the platform to give you the insights you need, rather than having to get third party tools to do that.
Sorry, I'm not sure I understand the question here.
Understandable, but I'd encourage doing research into the coverage of information that can be obtained without the GitHub API fallback. The reason that I added it to So dropping a check against |
@aaronpowell Thanks for the response! I just wanted to write up the thoughts that the team had just to be fully transparent. If there's interest, I can help provide some functional designs for how this proposal might integrate into the existing dotnet CLI commands that NuGet manages today such as With regards to the previous thought on an SBOM providing similar information, those reading this comment can check out what is an sbom to help inform opinions as license information would likely be included in both the dotnet CLI & a SBOM. We'll definitely take a look at the % of packages on NuGet.org that contain each type of license metadata. Given it's a |
If it seems more logical to have it as part of the metadata available off So, if the license information is added as part of the output of Something I also want to raise is that npm has a proposal for a similar feature - npm/rfcs#182 (it was the inspiration for this proposal), so having an aligned machine-readable output would make it easier to produce a view of licenses across multiple platforms that may be in use by a project. |
@aaronpowell I filed a similar (but not identical) issue: #10993 While both issues both point into the same direction, mine has extended requirements, as it focuses on a different use case. I think this proposal will work fine for framework dependent distributions of desktop or server applications, but does not completely cover use cases like
Do you think we could unify our proposal by extending yours to include my requirements? |
@markusschaber I've had a read of your issue, but I'm not sure there's much overlap between what's proposed across these two, other than they are talking about licenses. The primary objective of this proposal is to surface data that is hard to get, a dump of the list of licenses from dependencies (including transient and framework) of a project. It's intended to be unopinionated about what to do with that data, and it's up to the consumer to make decisions around allow lists/deny lists, etc. I'm also not sure that NuGet would be the right place for a command such as you're describing. What you're describing requires integration through the linker, understanding the different platform targets, etc., which is much more than a package manager handles. As a result, I don't think extending this proposal is the right approach, it'd add complexity to what is (in theory 😅) a relatively simple proposal. |
@aaronpowell Ok, I agree with that. So we keep the proposals separate. |
proposed/2021/LicenseInspection.md
Outdated
|
||
- If a `license` field is in the nuspec, check if it's a SPDX ID, if so, return. If it's a file, use Sørensen–Dice coefficient to compare it to a known list | ||
- If a `licenseUrl` is provided, attempt to download the license from the endpoint and compare with the known list | ||
- Fallback to looking at the package feed and see if it provides license information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should also be a mechanism to exclude packages. We run our own nuget package stream containing homegrown packages that do not include any licanse information but don't need to, since they are only used internally and thus don't need to be checked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience! Some comments.
proposed/2021/LicenseInspection.md
Outdated
|
||
The technical workflow for license detection would follow: | ||
|
||
- If a `license` field is in the nuspec, check if it's a SPDX ID, if so, return. If it's a file, use Sørensen–Dice coefficient to compare it to a known list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious on why do you chose Dice coefficient for text similarity. Have you considered other text similarity techniques? For example
(I did a homework on text similarity during my college years)
Are we expecting large texts so that min-hashing is worth doing?
Also, how license text will be processed? Will stop words be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason that similarity algorithm is specified is because dotnet-delice
(which inspired this proposal) is a .NET port of the JavaScript delice
tool, and I wanted to have the same level of similarity applied so you could use both tools together and get comparable results.
License text should be processed as provided by the package, removal of stop words or anything else would mean that you're not accurately processing the license as shipped by the package and may give incorrect results.
The same goes for the license as provided by SPDX, there should be no modification of it to avoid incorrect results.
proposed/2021/LicenseInspection.md
Outdated
|
||
### Technical explanation | ||
|
||
The first technical challenge for this is the inconsistent nature of which licenses are provided by NuGet packages. While the [`licenseUrl` field was deprecated](https://github.com/NuGet/Announcements/issues/32), some projects haven't adopted the new format (or older packages that predate the deprecation are in use), making it difficult to determine what the license of a project is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I would suggest computing similarity as a command option. Making a compliance decision without reading the license can be risky, as just one word can change the whole license meaning.
- How about scope-spliting this spec into the following scenarios?
- first, listing licenses of all packages, e.g.
dotnet list package --show-license
, just showing license types - then, making 'similarity aggregation' in SDPX licenses, as an option
- later, making 'similarity aggregation' license files, with similarity selection
- finally, licenseUrl, as an option
- first, listing licenses of all packages, e.g.
I cannot make any guarantees of whether or not this going to be implemented, but, implementing the first two scenarios looks feasible. We are more than happy to review a community PR :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I would suggest computing similarity as a command option. Making a compliance decision without reading the license can be risky, as just one word can change the whole license meaning.
This wouldn't make compliance decisions, it's merely surfacing the information that is required to allow decision makers to make said decision. A level of similarity is required, which is why something such as Sørensen–Dice coefficient is used to determine the license (if it's not explicitly set), otherwise you'd be provided with what is mostly unhelpful information (from when I did the initial proposal, the usage of the SPDX ID's for licensing was low relative to a dedicated URL).
How about scope-spliting this spec into the following scenarios?
- first, listing licenses of all packages, e.g.
dotnet list package --show-license
, just showing license types- then, making 'similarity aggregation' in SDPX licenses, as an option
- later, making 'similarity aggregation' license files, with similarity selection
- finally, licenseUrl, as an option
I cannot make any guarantees of whether or not this going to be implemented, but, implementing the first two scenarios looks feasible. We are more than happy to review a community PR :)
Dropping license similarity based on file content I feel would really decrease the value of using this, as my past testing has indicated that the usage of SPDX identifiers in NuGet packages for license indication is relatively low, and that's ultimately why I added the feature to dotnet-delice
to do template comparisons.
Given that this has already been implemented once, I'm probably overestimating the simplicity of implementing it again (converting the F# to C#), but doing so with some additional branches that allows you to opt-out of SPDX template comparisons wouldn't be a huge overhead in the process.
Team triage meeting: handing off to @JonDouglas ; please help driving this proposal. Thanks! |
This PR has been automatically marked as stale because it has no activity for 30 days. It will be closed if no further activity occurs within another 330 days of this comment. If it is closed, you may reopen it anytime when you're ready again, as long as you don't delete the branch. |
Please do not close this for lack of activity. It would be a very useful feature and I don't see an official "yes" or "no" here. |
Hi all, I know it has been a couple years since this was proposed. It is a good idea(kudos to @aaronpowell on being early on calling it out) and is something that keeps coming up. One challenge for NuGet at the time was our license adoption for packages. Back in 2021, we didn't have great adoption of best license practices(i.e. expressed licenses). This wasn't really called out here on the proposal but I'm going to call it out now. 2023 is looking like a much better picture and something we need to keep our eyes open for is license auditing as per this proposal. I know the bot recently closed this, but I believe this proposal should be kept open to collect 👍 for awhile longer until we can properly understand how to add this to tooling. This is a common/highly requested ask and I'd like to note that here. If you're reading this comment, please continue to contribute to this proposal and the ideas of how we can list all package licenses that are expressed in the NuGet tooling. |
@JonDouglas - I haven't done much with the tool that inspired this proposal for a while, would it be useful for yourselves if I updated it to the latest .NET and ran some tests to see what the output looks like with the current state of NuGet package licenses? |
This PR has been automatically marked as stale because it has no activity for 30 days. It will be closed if no further activity occurs within another 330 days of this comment. If it is closed, you may reopen it anytime when you're ready again, as long as you don't delete the branch. |
Please do not close this for lack of activity. See comment from @JonDouglas above:
Also, there is an open question from the PR author to @JonDouglas. Offtopic: Does the bot really create those "no recent activity" comments after 30 days although the actual closing would only happen after another 330 days (a full year of no activity)? |
Thank you @cremor. Let me see if we can get this bot disabled entirely here. It is not very helpful in the context of design/proposal PRs. Also, it just disrupts and discourages people from engaging. Just to be clear with people on this specific issue, there is no Yes/No decision made here. I am suggesting we are in a phase of "not yet" but need to keep collecting sentiment and be transparent about why we're not there yet (i.e. sharing data like I did in September). Licenses are especially important today i.e. https://www.sonatype.com/state-of-the-software-supply-chain/introduction |
Hi, we have removed our "proposals" folder, so please move this proposal to the "accepted" folder. |
This PR has been automatically marked as stale because it has no activity for 30 days. It will be closed if no further activity occurs within another 330 days of this comment. If it is closed, you may reopen it anytime when you're ready again, as long as you don't delete the branch. |
@aaronpowell Please update this proposal PR as explained by @donnie-msft above. |
@kartheekp-ms Looks like the label Status:Do not auto close that you've added to this issue doesn't work. |
I've moved it to the 2023 proposals folder |
This PR has been automatically marked as stale because it has no activity for 30 days. It will be closed if no further activity occurs within another 330 days of this comment. If it is closed, you may reopen it anytime when you're ready again, as long as you don't delete the branch. |
I don't think this proposal should be marked as stale/be closed, out of the following reasons:
|
Given we're coming towards the 3-year mark since this proposal was first put forward, I'd like to know what the stance on it is. I really do think that this would be a valuable addition to the NuGet CLI, and the viability of it has increased due to the greater adoption of how licenses are stored in NuGet packages (as @JonDouglas pointed out). I'm still happy to contribute the implementation of this if that would be of value. |
Here are some thoughts. I'll be as transparent as possible. This is a good proposal and something the tooling could generally use to delight people with a helpful command. Here's where we are though. Today we are working on two major things, generative AI and security I.e SBOMs. As one may imagine both of these truly help the problem of listing and understanding licenses to help one audit true license risk. While our focuses are more so on how we can make NuGet better long term for many of these things and push adoption of things like license expression as a best practice (as seen by previous comments) I think that this specific proposal would need to be championed by the community to the OSS project and our teams (dotnet and NuGet) can help shepherd this into the formal tooling such as dotnet and NuGet cli. That however is just my community opinion here with what I know and I believe we will need some further team input here as ultimately it is a team sport to get these things done. @NuGet/nuget-client for example to share some thoughts if anyone on the team would like to add additional perspectives. |
|
||
The first technical challenge for this is the inconsistent nature of which licenses are provided by NuGet packages. While the [`licenseUrl` field was deprecated](https://github.com/NuGet/Announcements/issues/32), some projects haven't adopted the new format (or older packages that predate the deprecation are in use), making it difficult to determine what the license of a project is. | ||
|
||
The next challenge is how to detect licenses from license files. The ideal approach would be to mirror GitHub's approach, which uses [Licensee](https://licensee.github.io/licensee/) [for detection](https://help.github.com/en/articles/licensing-a-repository#detecting-a-license) (but naturally a dotnet implementation). Essentially, this uses [Sørensen–Dice coefficient](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient) with a threshold for what is the acceptable level of comparison between the package's license file and license template. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, for anything to ever be a part of the dotnet
commands, it needs to be source buildable: https://github.com/dotnet/source-build?tab=readme-ov-file#source-build-goals.
We can't take a non .NET dependency easily.
|
||
## Prior Art | ||
|
||
I have created a dotnet global tool that does this, [`dotnet-delice`](https://github.com/aaronpowell/dotnet-delice). This proves that it is a technical possibility to implement such a solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love that there's a global tool for this. They're ideal for these types of scenarios, where getting the functionality in the .NET SDK itself would meet some blockers like the license detection above.
Any news? For the moment I use https://github.com/sensslen/nuget-license until an official solution is released, which I hope it will be someday. |
This PR has been automatically marked as stale because it has no activity for 30 days. It will be closed if no further activity occurs within another 330 days of this comment. If it is closed, you may reopen it anytime when you're ready again, as long as you don't delete the branch. |
This proposal introduces a new feature for dumping the list of licenses from all dependent NuGet packages (direct and transient), so that people can better understand what licenses are used within a project.
It's modelled from a dotnet tool I wrote - https://github.com/aaronpowell/dotnet-delice