-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds Package Scoring v1 #11884
base: dev
Are you sure you want to change the base?
Adds Package Scoring v1 #11884
Conversation
- No website | ||
- No repository | ||
- Unmaintained | ||
- No bug tracker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have dedicated metadata for bug tracker URL. And for the life of me I couldn't find an issue tracking such a feature request. We could potentially render an "issues" URL if a GitHub repository URL is provided by this isn't even perfect. You can turn off issues at the repo level in GitHub (see NuGet/NuGet.Client repo) so we can't even assume {GitHub repo}/issues
is a valid link.
Since this is not possible with current data pipelines, we will work on providing such an experience in the future and use two proxy values such as total weekly downloads and total count of packages depending on the package to be scored up to a percentile of up to 100%. | ||
|
||
- Total Weekly Downloads | ||
- Number of Dependents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be tuned to dependents owned by other others? Or perhaps number of distinct owners depending on the package with their own package?
- Provides a valid README | ||
- Provide documentation | ||
- Provides an example | ||
- 20% or more of the public API has xml doc comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Platform support | ||
- Supports all possible modern platforms i.e. [.NET 5+/.NET Standard](https://docs.microsoft.com/en-us/dotnet/standard/net-standard?tabs=net-standard-1-0#net-5-and-net-standard) (iOS, Android, Windows, MacOS, Linux, etc) | ||
- Pass static analysis | ||
- Code has no errors, warnings, lints, or formatting issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Static analysis would be performed on compiled assemblies, not the input source, right? Or would we need to correlate source and run static analysis on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some of these are better applicable to eco-systems where the code is shipped rather than compiled assemblies.
Example formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Not sure what static analysis means exactly.
|
||
- Package updated in the last year. | ||
|
||
The score may also be shown as a decayed value over the year approaching the year mark where it goes stale. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how a package owner can attest "this package is still maintained but it hasn't needed an update in over 1 year". Or is this a case we don't think is common?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that different people have different preferences, but I'm on the more risk adverse side. I highly value mature and stable packages. Packages with high churn increase risk of bugs, so I strongly dislike punishing (reducing the score) for "older" packages.
Consider Newtonsoft.Json for example. Excluding the recent prerelease version, the current stable version (13.0.1) is more than 12 months ago. The time between it and the previous stable version (12.0.3) was more than 12 months. Are we really willing to say that this package is unmaintained and deserves a reduced score?
I understand the intent is to encourage packages that have bug fixes, but I think a simple "how long since the last publish date" is too simplistic and will have worse unintended consequences than the benefits it hopes to bring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heard a great quote about the Rust packages (crates) today: https://youtu.be/Z3xPIYHKSoI?t=145
They are not abandoned, they are done
I feel like this should be something to strive towards, instead of expecting packages to be constantly updated
|
||
#### Quality | ||
|
||
Quality is the combination of: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about:
- sources available
- symbols available
(likely SourceLink)
i.e. the package is easily debuggable.
- Missing README | ||
- No example | ||
- Not enough public API documentation | ||
- Bad semver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does bad semver mean?
4. Package has high quality documentation. | ||
5. Package is maintained by a notable author or organization. | ||
6. Package has been updated recently and is updated regularly. | ||
7. Package has few dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's worth analyzing or surfacing the total size of a package's dependency graph or just consider the top level. If the later, maybe a score based on this could be gamed/unintentionally obfuscated by having a single meta-package dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great write-up, well done!
- Invalid .nuspec / .csproj | ||
- Missing dependency | ||
- Missing README | ||
- No example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on this?
- Not enough public API documentation | ||
- Bad semver | ||
- Not v1+ | ||
- No website |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this refer to the projecturl metadata?
- Deprecated license | ||
- Missing license | ||
- Non SPDX license | ||
- Unsafe copyright |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would this be?
|
||
EX: A package might be missing a README which a developer may not be able to easily get started with the package. They might be able to then contribute a new issue or even the inclusion of the README file to the NuGet package if it's open source. | ||
|
||
**Issue Examples:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great list.
Wonder if we should take a look at how feasible all the individual bullet points are
|
||
- Follow NuGet conventions | ||
- Provides a valid .csproj / .nuspec | ||
- Provides a valid README |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would a valid readme mean? Something that renders correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @chgill-MSFT, @lyndaidaii - we've talked a bit about README quality in other contexts. I think any valid, linked readme file should meet the criteria. Perhaps we can basically do a string.IsNullOrWhiteSpace
. Once authors have onboarded to the feature, it's very easy to incrementally improve it. It's a better baseline to be in than just a plain text description.
Imagine if you see someone else's repo you want to open a PR against -- it's certainly more approachable to edit a single Markdown file in the PR than do all of the onboarding and potential plumbing to set it in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this could validate that the README uses HTTPS for all links and images?
- Provide documentation | ||
- Provides an example | ||
- 20% or more of the public API has xml doc comments | ||
- Platform support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be mindful about how we handle this for packages that are dedicated to a specific platform only, example mac only.
There could be an android matching one that's a separate package potentially and they have the same dependencies.
Not sure if we can easily capture any of my comments in automation, but just wanted to make sure we don't miss out on that perspective :)
- Platform support | ||
- Supports all possible modern platforms i.e. [.NET 5+/.NET Standard](https://docs.microsoft.com/en-us/dotnet/standard/net-standard?tabs=net-standard-1-0#net-5-and-net-standard) (iOS, Android, Windows, MacOS, Linux, etc) | ||
- Pass static analysis | ||
- Code has no errors, warnings, lints, or formatting issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some of these are better applicable to eco-systems where the code is shipped rather than compiled assemblies.
Example formatting.
- Pass static analysis | ||
- Code has no errors, warnings, lints, or formatting issues. | ||
- Support up-to-date dependencies | ||
- All of the package dependencies are supported in the latest version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand this.
Because of how compatibility works, if a package supports a certain framework, then all future versions of that framework are likely to be supported as well.
If a package claims to support a framework but their dependencies don't then that's a problem package.
Maybe that's what we should focus on instead, ensure that the package is installable in all of it's declared frameworks.
- Code has no errors, warnings, lints, or formatting issues. | ||
- Support up-to-date dependencies | ||
- All of the package dependencies are supported in the latest version | ||
- Package supports the latest stable .NET SDK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this work?
.NET SDK support is not really a package construct right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know this with the compiler flags metadata included in the pdb's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that answers the question.
Supports SDK
is not really a thing. Supporting certain frameworks is what's expressed in NuGet.
Metadata about what SDK was used to build probably isn't helpful either.
|
||
#### Quality | ||
|
||
Quality is the combination of: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about "Opted in to Nullable reference types"?
https://pub.dev has this as a metric and even adds this cool badge to the package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great idea, and a huge win for package consumers if implemented more broadly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @loic-sharma 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nullable reference types require C# 8, which is not supported on .NET Framework. If NRT is scored, then it will lure developers to an unsupported scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well that's not entirely true. You can use C# 8+ features in .Net Framework as long as they don't require some new types from SDK.
NRT only requires a handful of attributes which can be created manually, or added via something like https://www.nuget.org/packages/Nullable#readme-body-tab
After that NRT can be fully used even in .Net Framework
I'm worried a lot of these metrics will lead to libraries that spend a lot of time on the marketing to be ranked better, instead of spending time building a great library. While having a great test coverage is time consuming, it's definitely better for the package quality and their users than spending time marketing the library. In some other ecosystems, some manages to get a lot of popularity only by spending tons of their time on marketing, while the library itself doesn't bring or solve anything new, or better. I also fear that a lot of metrics will enforce current practice and hinder new innovative and potentially better practices. I saw a "is the code correctly formatted" metric: While for a simple static analyser standpoint, this code is not correctly formatted, such layout allow better code readability, but will penalise the scoring. Mono.Cecil, and some other great .NET librairies would rank badly on a lot of the metrics listed, but are very valuable package for the .NET ecosystem. In additions, a lot of these metrics are easily to cheat on. There are xml docs generators that write basic english sentences from the code, allowing to reach 100% of useless documentation instantly. You can start trying to programmatically check for the docs quality, but then someone will start generating the documentation with github copilot. While I welcome new packages metrics, I don't think a package score, where its purpose is to help package choice, should be made from an aggregate of metrics computed on the package, it will incentivise cheating on, and these metrics will become meaningless. Worse, ranking too high on some of these metrics could rise suspicion instead of being a proof of quality. On the topic of metrics being cheated on, it doesn't have to come from the library author themselves, for exemple, npm audit is spammed of regex DoS and I remember reading a blogpost about CVE spam that happened to boost spammer's resume, but I sadly can't find any link to it. Something that I didn't seen proposed, and would be far easier to implement (than a lot of static analysis), is a nuget package rating/comment systems on nuget.org. While it causes other issues, like angry devs swearing at benevolent package maintainer, requiring moderation, heated discussion between a user that lowly rated a package and it's maintainer, etc, I think it has a lot of benefits:
Thank you if you have read everything :D |
|
||
- No known security vulnerabilities (Critical, High, Moderate, Low) in top-level or transitive dependencies. | ||
- Specifies a valid license (SPDX, embedded) | ||
- Not deprecated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should packages with installation scripts get flagged for security purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the concept of "flagged" in this scoring conversation is very interesting. We can have "clearly good" things like having a license and "clearly bad" things like having no content or dependencies. But what about "watch out!" things? Scripts/MSBuild tasks are not necessarily bad but they should probably be noted to the consumer.
But if we do that, then we have to think about transitivity... a sneaky package author could hide a flagged attribute in a dependency package. It gets complicated fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making scripts/msbuild tasks more visible on a package details page? Like have a tab with its text, so one could evaluate the contents without downloading a package?
- It resolves security flaws quickly. It fixes known vulnerabilities & releases an unaffected release quickly. | ||
- It is not deprecated. It is in an active state meeting all the criteria above. | ||
|
||
This proposal introduces the first iteration of package scoring. While it derives from the original proposal [last year on this topic](https://github.com/dotnet/designs/pull/216), it provides the minimum requirements for a v1.0 of implementing this on NuGet.org. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume "on NuGet.org" means that Visual Studio and dotnet CLI are out of scope. Therefore, there are also no proposed changes to the NuGet protocol (that clients use to talk to servers).
Is it worth making any of this more explicit?
|
||
Since this is not possible with current data pipelines, we will work on providing such an experience in the future and use two proxy values such as total weekly downloads and total count of packages depending on the package to be scored up to a percentile of up to 100%. | ||
|
||
- Total Weekly Downloads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why weekly downloads? It can be gamed too. Because during XMAS down time someone run bunch of bots to generate 10k downloads for own package, then it it's placed at higher in ranking then it gets more downloads in following weeks. To reduce weight of this kind of spike in finance use moving average last 2-3 months. Actually, implementing moving average is not complicated, simple arithmetic.
Team Triage: Pending conversations in this proposal
This PR has been automatically marked as stale because it has no activity for 30 days. It will be closed if no further activity occurs within another 15 days of this comment. If it is closed, you may reopen it anytime when you're ready again, as long as you don't delete the branch. |
This PR has been automatically marked as stale because it has no activity for 30 days. It will be closed if no further activity occurs within another 15 days of this comment. If it is closed, you may reopen it anytime when you're ready again, as long as you don't delete the branch. |
This PR has been automatically marked as stale because it has no activity for 30 days. It will be closed if no further activity occurs within another 15 days of this comment. If it is closed, you may reopen it anytime when you're ready again, as long as you don't delete the branch. |
This PR has been automatically marked as stale because it has no activity for 30 days. It will be closed if no further activity occurs within another 15 days of this comment. If it is closed, you may reopen it anytime when you're ready again, as long as you don't delete the branch. |
This PR has been automatically marked as stale because it has no activity for 30 days. It will be closed if no further activity occurs within another 15 days of this comment, unless it has a "Status:Do not auto close" label. If it is closed, you may reopen it anytime when you're ready again, as long as you don't delete the branch. |
This proposal introduces a concept known as package scoring or net score for short. This is a pagerank-like score that depends on qualities of the whole NuGet package & dependencies while accounting for the risks of today's modern, connected world and all of the security implications they have on package managers. This is the first iteration building on-top of the work here:
dotnet/designs#216
Rendered Proposal
Please 👍 or 👎 this comment to help us with the direction of this feature & leave as much feedback/questions/concerns as you'd like on this issue itself and we will get back to you shortly.
Thank You 🎉