Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve PyPI package declared license detection #2487

Open
pombredanne opened this issue Apr 11, 2021 · 0 comments
Open

Improve PyPI package declared license detection #2487

pombredanne opened this issue Apr 11, 2021 · 0 comments

Comments

@pombredanne
Copy link
Member

pombredanne commented Apr 11, 2021

The goal of this ticket is to improve PyPI package license detection across the board. While scancode-toolkit's PyPI package detection is pretty good, there a few repeat cases where license information is not properly gathered from PyPI package metadata. Usually this is because a declared_license value contains things we did not expect (like a URL) or is improperly formed.

Resolving this would likely require a mix of:

  • adding new license detection rules to scancode,
  • adding new and improved code to handle the specific patterns of license,
  • creating new license mappings
  • and possibly working with upstream maintainers to improve their license declarations.

The approach should be to start with a complete data set of all package manifests and find patterns of license issues and establish the baseline, possibly with classifiers and ML. The end results should be a significant improvement to the license detection quality for the PyPI packages.

This https://github.com/pypa/bandersnatch/ and the PyPI API may help collect a list of all declared licenses.

See also https://www.python.org/dev/peps/pep-0639/ and may be #253 too

There are also other related ticket for other package types such as:

And a project idea: https://github.com/nexB/aboutcode/wiki/Project-Ideas-Improve-PyPI-package-license-detection

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant