-
-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve quality and tracing of license detection in Debian copyright files #2390
Comments
From a chat with @chinyeungli
|
From a chat with @JonoYang based on scanning a Ubuntu-based Docker image in https://github.com/nexB/scancode.io/ that contained https://packages.ubuntu.com/bionic-updates/gcc-7
|
From a chat with @mjherzog based on scanning a Ubuntu-based Docker image in https://github.com/nexB/scancode.io/
|
Signed-off-by: Philippe Ombredanne <[email protected]>
We now test with and without dedup of licenses and copyrights. Signed-off-by: Philippe Ombredanne <[email protected]>
See aboutcode-org/scancode.io#103 (comment) for a detailed description of the problems |
Signed-off-by: Philippe Ombredanne <[email protected]>
The current test for debian copyright files was wrong and misleading. This corrects the problem by having proper values in plain expected files and in detailed files. There was also a problem of test name masking where both detailed and non-detailed test methods had the same name and therefore were not running correctly at all. As a result all expected YAML files have been regenerated too. Signed-off-by: Philippe Ombredanne <[email protected]>
This is the set of files found in a recent debian-unstable-slim Docker image. The expectations have been regenerated as-is but not yet revewied. See also: - aboutcode-org/scancode.io#128 - aboutcode-org/scancode.io#103 Signed-off-by: Philippe Ombredanne <[email protected]>
To improve the tracing I think we could have this simple way:
This way we can get regular license detection results from just copyright files irrespective of being in the cntext of a package or not. |
@AyanSinhaMahapatra FYI ^ |
Refactor debian copyright detection to add DebianCopyrightDetector class, makes changes to facilitate better copyright file parsing. Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Fix bug in unstructured copyright file parsing, which always treated copyright files as structured, and regenerate tests files. Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Remove `unique` and `simplify_licenses` to have non-unique and non-simplifies copyright and license information. Use with_debian_packaging instead of using with_details and skip_debian_packaging. Regenerates test for to update expectations. Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Refactor and improve structured debian copyright file parsing. Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Modify EnhancedDebianCopyright to be a DebianCopyright wrapper function and modify flags used for filtering and reporting. Seperate structured and unstructured parsing into different classes having the same base class and main methods. Also modify file to follow black standards. Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Updates get_installed_packages to directly call parse_copyright_file function and get an object depending on structured/unstructured copyright file and then call functions with filtering flags to get detections as required. Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Add tests for EnhancedDebianCopyright class and also modify test functions to adopt the new API. Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
This makes declared_license also report declared license in the license paragraph of debian copyright files. Updates test expectations. Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Modify get_copyrights to have unique copyrights when the unique_copyrights flag is set to True. Refer to #2390 Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Regenerate test expectations after upgrading to latest debian-inspector to parse paragraphs after double empty lines correctly, as the latest version fixes this issue. Refer to #2390 Refer to aboutcode-org/debian-inspector#17 Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Instead of adding a general `unknown_debian_license` rule, create a synthetic UnknownRule object and a LicenseMatch object out of the unknown license text. Updates test expectations. Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Instead of adding a general `unknown_debian_license` rule, create a synthetic UnknownRule object and a LicenseMatch object out of the unknown license text. Also updates test expectations after reindexing licenses with new rules added from develop branch. Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Instead of adding a general `unknown_debian_license` rule, create a synthetic UnknownRule object and a LicenseMatch object out of the unknown license text. Also updates test expectations after reindexing licenses with new rules added from develop branch. Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Update requirements and setup.cfg files to install the latest debian-inspector version 21.5.25 to fix the following issue: aboutcode-org/debian-inspector#17 Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Instead of adding a general `unknown_debian_license` rule, create a synthetic UnknownRule object and a LicenseMatch object out of the unknown license text. Also updates test expectations after reindexing licenses with new rules added from develop branch. Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Improve debian license detection #2390
Improve debian license detection #2390
we should be able to recover from mostly OK but not correct copyright files such as this one: https://metadata.ftp-master.debian.org/changelogs//main/p/pulseaudio/pulseaudio_14.2-1_copyright (this may be a ticket for the debian-inspector debut library though)
See Recover parsing from almost machine-readable copyright files debian-inspector#6 Recover parsing from almost machine-readable copyright files
we should have the ability to trace the intermediate detection results (see also Package license data structure: Improve tracing of license detection in package manifests #2389 ) for each paragraph of a copyright file
we could establish a mapping of declared License "ids"
there is an implicit notion of primary vs. secondary licenses in a copyright file and we should leverage this: a paragraph with "Files: *" applies to the package as a whole. This may mean a system-wide model change to track primary vs. secondary license or have the ability to track that in a license expression. See Determine the primary license from a copyright file debian-inspector#8 Determine the primary license from a copyright file
The text was updated successfully, but these errors were encountered: