-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Package Scoring Proposal #216
base: main
Are you sure you want to change the base?
Conversation
Also worth referring to this under the prior art section: |
- Total GitHub Stars | ||
- Total GitHub Forks | ||
- Total GitHub Contributors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this force a project to be on Github to have a good rep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not likely. Just a healthy amount of stars/forks/contributors to determine the optimal stop to achieve the maximum score.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not likely. Just a healthy amount of stars/forks/contributors to determine the optimal stop to achieve the maximum score.
I think the question was: if stars/forks/contributors are considered, how will the package score be affected for packages that aren't on GitHub, e.g. because they are on Bitbucket or are proprietary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a great answer. It's an unresolved question at the bottom for now.
The two main paths I would see is we would use the information we get from a connected repository provider as an additive to the popularity score & catalyst for showing the community score.
|
||
**Maintenance**: | ||
|
||
- Open GitHub Issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be hard to measure. Huge active projects have thousands of open Issues. Stale projects would also have many open issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How some other ecosystems have accomplished this is by having a reasonable threshold or optimal stop to achieve the maximum score for these ratios. Definitely shouldn't be punished for having issues, we all have em! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is useful to track is the open/closed ratio, for established projects this usually gives you a good idea of how serious their maintenance effort is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's how npm scores issues: https://github.com/npms-io/npms-analyzer/blob/master/lib/scoring/score.js#L70-L84
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JonDouglas commit frequency is a good one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although if you take a project like Reactive Extensions - it's very mature, and the commit frequency is quite low https://github.com/dotnet/reactive/commits/main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely shouldn't be punished for having issues, we all have em! 😄
Speak for yourself 😜
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful to define a set of archetypal packages with various characteristics, and consider how they would be scored under these metrics. For example, what do we want the score to look like for a small, tightly scoped, well maintained, mature project? Or an abandoned package built out of the same repo as more mature and actively developed packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhutch Absolutely. We'll be able to run feasibility dry runs & provide tooling for anyone to test locally for what can be scored up front. These types of examples should put the proposal to the test & there's many more categories of metadata we can include and run through.
|
||
- Total Weekly Downloads | ||
- Number of Dependents | ||
- Total GitHub Stars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I heard many times people say that they look at the number of stars to pick a library or not.
Here, it's just a measure among many others. That's good. But I don't know whether enforcing the concept of stars = good
is something I would like to promote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We hear the exact same thing. In fact we hear it so much that people go straight to the GitHub repository to assess if a community is active or the package is still being maintained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK there're some people buying online services to increase their project stars (via finding lots of people to star their projects), to make their projects look "great".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best we can do for gamification is to diversify the metrics used for each category and address it if it becomes too much of a problem. GitHub stars are used for many other scoring concepts in other ecosystems, should we consider otherwise?
|
||
<!-- Why are we doing this? What pain points does this solve? What is the expected outcome? --> | ||
|
||
Developers are frustrated with the packages on NuGet meeting their needs. These needs are typically categorized in multiple categories: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Developers are frustrated with the packages on NuGet meeting their needs.
This feels a bit strong. To me, it's more like "Developers are frustrated with finding the packages on NuGet meeting their needs."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. I wanted to put emphasis that even if a developer found a package, they still have challenges with the package meeting their needs due to maintenance or quality concerns. i.e. the package hasn't been updated for years or targeting a TFM they are.
- Number of downloads is used as a proxy to the overall quality of the package. | ||
- If other people download it, that means it must be good! | ||
- Additional metrics should be added such as how many packages depend on the package. | ||
- A network of what depends on the package may make it more practical to the ecosystem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I fully agree. Some packages are inherently application only packages, such as Mono.Options
for parsing command line parameters. Rather, I think we should use a proxy of how many people have it installed in projects (which is different from downloads).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we can both agree that Mono.Options
has a large network of used by
and GitHub Projects
which in turn would make it more practical for an individual to take on as a dependency? I'm probably not understanding too! 😄
Installed in projects might be possible in the near future!
- If other people download it, that means it must be good! | ||
- Additional metrics should be added such as how many packages depend on the package. | ||
- A network of what depends on the package may make it more practical to the ecosystem. | ||
- Popular packages may take on many dependencies that make a popular package “unhealthy”. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying number of dependencies count against the package or dependencies on less healthy packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was saying that a popular package may take on dependencies that themselves are unhealthy thus making the package less healthy as a result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also read this as: "package is unhealthy if it takes too many dependencies"
Do you mean instead something along these lines?
- Popular packages may take on many dependencies that make a popular package “unhealthy”. | |
- A package's score can be affected by the score of its dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about Popular packages may take on many dependencies that show unhealthy characteristics and thus make the popular package "unhealthy"
?
- The more dependencies a popular package takes on & isn’t maintained, the higher likelihood more people become vulnerable from downloading it. | ||
- **Quality** | ||
- High quality packages make the overall ecosystem higher quality. | ||
- High quality .NET tools and templates also improve the ecosystem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how that's relevant here; obviously packages aren't the only thing that makes an ecosystem great :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The people and the community of course! 👋
(The comment was about the package ecosystem) 😆
- High quality packages make the overall ecosystem higher quality. | ||
- High quality .NET tools and templates also improve the ecosystem. | ||
- There are no current best practices or validation checks of best practices for NuGet. | ||
- Package authors have little to no incentive to update their package unless a major issue is found (bug, security vulnerability, etc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that most package authors build packages because they care; pride in their work and helping others is often the primary incentive. So this feels a bit harsh IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, didn't mean to make it sound harsh. Just found very similar research in other ecosystem's reports about why authors do not update packages regularly. Not trying to discount people's work, just trying to point out that unless an author knows about something big, they probably have no incentive to push an update outside regular planned releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to what @terrajobst said, it's a bit unclear why it affects quality: The package could be so stable and solid that it doesn't need an update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me re-word or expand here. I feel like we're saying the same thing but it doesn't read that way. Thanks!
- Documentation is often an afterthought and not surfaced in a direct way for a .NET user to see. They usually go straight to GitHub or the documentation pages. | ||
- Adoption of best practices & improved experiences are generally lower than expected. | ||
- **Maintenance** | ||
- Regularly maintained packages avoid circular dependencies by taking on latest major versions of packages that are SemVer compatible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. What does SemVer have to do with circular dependencies? Cycles in a package graph are always bad and arguably should be blocked from being uploaded in the first place. Did you maybe mean diamond dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry yes, this is supposed to be diamond dependencies. Not circular ones.
- “Harm reduction” is an idea that people are going to do risky activities or be in risky situations and instead of saying “don’t be risky”, we can choose to help make those activities less risky by being proactive. | ||
- Let’s make maintainer’s lives easier to respond to and fix issues while encouraging community contribution. | ||
- **Community** | ||
- Make it easier for someone to use a package with an abundance of documentation. | ||
- Clearly explain to someone how they might contribute to a package they use. | ||
- Demonstrate to people that packages are impactful, valuable, and have traction in the ecosystem. | ||
- Help package authors sustain their work by incentivizing their time with raising donations/funding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All bullet points prior to these are problems while these are solutions or next steps. This makes the read a bit jarring -- maybe split and have an intro paragraph?
|
||
1. Determine whether a package is recognized by the ecosystem based on its **popularity and community** score. | ||
2. Understand if a package is actively maintained or abandonded based on its **maintenance, quality, and community** score. | ||
3. Prevent installing a package that contains a known vulnerability that has not yet been resolved based on it's **maintenance and quality** score. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A score is a number, but this sounds like a binary decision? I don't understand how these two connect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The score helps influence a developer to make that binary trust decision for the package based on their job to be done.
For example if maintenance & quality had a combined score of 50 and I saw a package that had 45/50 for these categories, I would feel a bit more comfortable with trusting that package at the time of including it in my project vs. another package that scored lower such as 15/50 and shows signs of abandonment.
|
||
**Maintenance**: | ||
|
||
- Open GitHub Issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely shouldn't be punished for having issues, we all have em! 😄
Speak for yourself 😜
|
||
**.NET CLI / Tool** | ||
|
||
![](PackageScoringTool.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The visual of the CLI reminds me of a specific song and so here I sing:
🎵 Your NuGet is a hall of shame, you give .NET a bad name 🎵
Hi there, this is an interesting proposal. Another metric that could be looked at is number of resolved questions in GitHub Discussions on the related repo. This could indicate healthiness/usability of the package. The number of posts in general could indicate how active the community is. Also, when looking at issues, consider scoring differently based on the types of issues (determined by standard tags such as |
@JonDouglas I think this is an interesting proposal with a few concerns. I do really appreciate that it is making an attempt to think about OSS tooling as a product / value stream so that users in the ecosystem can consider an open-source package vs. being forced into a Microsoft-only toolbox due to dependable support. So, props for that. 👍 I think the concept of "scoring" here is what I most struggle with. A score implies a judgment against a fixed set of standards, but the state of the art in the community is continually moving. Not sure how these rules / scores would be modified as things evolve in the future. Stars are one heuristic but I don't know that they should factor into a score. I often star a project just to bookmark it for later because it seems like a worthwhile tool in the toolbox, or because I want to support the creator. In lots of cases I haven't used the code for repos I've starred. These packages, even popular ones, are often run by people in their spare time. To treat them as a product and score them could have unintended consequences. It could be discouraging to long-time maintainers who are burning out ("ugh, I didn't look at issues this week and now my score dropped."). It could be especially discouraging to new authors and publishers of packages -- why bother moving forward with your take when a product in the ecosystem has an A+ score and thousands of stars? Even worse, if community package scores are published alongside Microsoft package scores, I think the disparity will continue to drive customers to use Microsoft-only solutions because Microsoft has a lot of resources to move toward Furthermore, I think the idea of a central body scoring packages developed by the community is difficult. It risks losing a multi-faceted view of who uses OSS and why, and risks putting default lens on open source of "packages that exist to support businesses / enterprise customers" (though to be clear, I don't think that is the intention here). What I'd like to propose -- and if @JonDouglas or others are interested, I will perhaps create a separate proposal here to consider alongside it -- is that we move away from a "scoring / judgment" and toward the idea of a "journey". The reasons being:
With any kind of scoring, I think we need to consider:
Which is to say, measurements can be gamed and and implicit incentive of a score is not necessarily going to lead to the best outcomes. So if I could tweak this proposal, I would suggest:
I know this has been a long comment. I want to end by again thanking @JonDouglas for surfacing this important conversation because I think the motivation here is really positive for the whole ecosystem. |
@egil I had thought about those items but sadly did not include them as Discussions is used for many reasons on various repositories and not all repositories use a similar tag structure & would be hard to build tooling around. |
@SeanKilleen Absolutely and that's what I hope this initial proposal accomplishes.
The .NET ecosystem and many developer ecosystems are always changing. This concept will have to evolve with the needs of the developers in the ecosystem. Some things aren't going to work out for the .NET ecosystem that may work for the JavaScript, Python, or Java ecosystems and we'll learn over time as a community what that is as we iterate together.
I want to emphasize that this scoring concept is accessible to all. Whether you have over 1 billion downloads or 1 download, you should be able to achieve the same/maximum score in many categories on day one of publishing. There are some categories that can be out of your control such as whether you are a popular package or have a thriving community around it.
I 100% agree with your perspective of a journey. I know I would personally love to see what you're thinking here & I'm sure many others would be interested as well! I only use the word "score" as it is very direct in what it means, but you are definitely right. We as a community need to come up with a judgement-free / additive score over a package's journey.
The initial idea here was to put the package journey in various thresholds/buckets. As the package grows or decays, it allows the ecosystem to be self-correcting.
There are many sides to this, I believe https://octoverse.github.com/static/github-octoverse-2020-security-report.pdf & https://www.linuxfoundation.org/wp-content/uploads/oss_supply_chain_security.pdf help demonstrate how devastating it can be for authors to unknowingly publish a package that includes a known vulnerability. The absence of that concern can give the author the peace of mind at the time of publishing by using NuGet's vulnerability scanning as a check.
This will be 100% transparent with sufficient details. If you were provided a scorecard as an author at pack/publish time or consumer at browse/install time, you will be able to see how that score was comprised & an empowering way to act upon it if it's within your control. |
@JonDouglas this is good feedback/conversation, thanks. I think I'll try to use your proposal as a starting point and follow it through using my take on things in a separate proposal so that folks can ask the appropriate hard questions of it. :) |
Here's another initiative doing similar things: https://openssf.org/blog/2021/05/03/introducing-the-security-metrics-project/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO one criteria for success is that when we're 'scoring down' a package for a reason, the author will want to learn what that reason was. We can assume that the author cares and wants to address those points to 'max out' the score so ideally tooling takes that into account, to guide the user the right direction (proper docs, notes, tips on how to win back those points)
- If other people download it, that means it must be good! | ||
- Additional metrics should be added such as how many packages depend on the package. | ||
- A network of what depends on the package may make it more practical to the ecosystem. | ||
- Popular packages may take on many dependencies that make a popular package “unhealthy”. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also read this as: "package is unhealthy if it takes too many dependencies"
Do you mean instead something along these lines?
- Popular packages may take on many dependencies that make a popular package “unhealthy”. | |
- A package's score can be affected by the score of its dependencies. |
There are a number of problems in each of these categories. | ||
|
||
- **Popularity** | ||
- Number of downloads is used as a proxy to the overall quality of the package. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is total weekly downloads good enough here? It implies 7 day moving average like other platforms.
|
||
There are a number of problems in each of these categories. | ||
|
||
- **Popularity** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another 'popularity' (aka vanity) metric to consider is the 👍 interaction with the page detail page.
A simple counter of hits from authenticated users on nuget.org.
pub.dev has this and I personally find it engaging:
- **Popularity** | |
- **Popularity** | |
- Number of 👍 given by users logged in to nuget.org |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could definitely consider something like this if there's enough interest!
- A network of what depends on the package may make it more practical to the ecosystem. | ||
- Popular packages may take on many dependencies that make a popular package “unhealthy”. | ||
- The more dependencies a popular package takes on & isn’t maintained, the higher likelihood more people become vulnerable from downloading it. | ||
- **Quality** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally believe nuget packages (at least not preview ones) should prefer performance over debuggability. More specifically should only include assemblies compiled with optimization.
If we agree on that, it would make sense to score packages accordingly. (shameless plug on the topic)
Absolutely. Each point removed for missed criteria will have a clear & empowering error message so the author can address to reclaim the missed points to get the maximum score. |
This proposal introduces a concept known as package scoring or net score for short. This is a pagerank-like score that depends on qualities of the whole NuGet package & dependencies while accounting for the risks of today's modern, connected world and all of the security implications they have on package managers.
A package score should provide a .NET developer enough information at the end of the day to make a trust decision of including a package in their software supply chain. It promotes authors to create high quality packages by following NuGet package authoring best practices and serves as a way to measure the overall health of the .NET ecosystem through NuGet.
As many other developer package manager ecosystems have already adopted similar package scoring systems or are in the process of creating their own, we believe that this scoring system will evolve with the continuous input from the .NET community and this serves as a starting point.
Rendered Proposal