-
-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve license detection of declared RPM licenses #2412
Comments
@pombredanne Can you please explain a bit more on this. |
An RPM can have a license declared which is one of the tags we collect.
Check https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/rpm_installed.py and https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/rpm.py |
To better explain the context, a list of used licenses can be found in each RPM metadata. These are also in the repomd such as in this one from CentOS: https://archive.kernel.org/centos-vault/8.0.1905/BaseOS/x86_64/os/repodata/087f260d0243e74021b5bde5d0091b2ba15998ffdcf336471bbe3e97ffafbb7b-primary.xml.gz <package type="rpm">
<name>ModemManager-glib</name>
<arch>i686</arch>
<version epoch="0" ver="1.8.0" rel="1.el8"/>
<checksum type="sha256" pkgid="YES">b27635edf4ece5cff60f231f8a578da14d300d98b9da4b5b52d43d4b4c43ba31</checksum>
<summary>Libraries for adding ModemManager support to applications that use glib.</summary>
<description>This package contains the libraries that make it easier to use some ModemManager
functionality from applications that use glib.</description>
<packager>CentOS Buildsys <[email protected]></packager>
<url>http://www.freedesktop.org/wiki/Software/ModemManager/</url>
<time file="1562077020" build="1557586982"/>
<size package="258216" installed="1184560" archive="1185624"/>
<location href="Packages/ModemManager-glib-1.8.0-1.el8.i686.rpm"/>
<format>
<rpm:license>GPLv2+</rpm:license>
<rpm:vendor>CentOS</rpm:vendor>
... Such primary repomd data could be used to assemble a list of license symbols (such as wget https://archive.kernel.org/centos-vault/8.0.1905/BaseOS/x86_64/os/repodata/087f260d0243e74021b5bde5d0091b2ba15998ffdcf336471bbe3e97ffafbb7b-primary.xml.gz
gunzip 087f260d0243e74021b5bde5d0091b2ba15998ffdcf336471bbe3e97ffafbb7b-primary.xml.gz
# using a perl regex to print only
grep -oP "(?<=license\>)(.*)(?=</rpm)" 087f260d0243e74021b5bde5d0091b2ba15998ffdcf336471bbe3e97ffafbb7b-primary.xml | sort -u This yields these:
Then thi would need to be massaged to get a list of symbols such This small list from CentOS is one of many, this is just as an illustration... it could be used this way more or less:
|
This is not simple, but a good first issue for a skilled aspiring contributor. |
Merging this other issue to consolidate things in one place for RPMs:
|
These other issues are closely related:
|
@ivanayov ping, following up on this comment aboutcode-org/license-expression#70 (comment) |
@sutula @qduanmu @richardfontana @jlovejoy ping FYI |
Description
We should create a license symbols map for RPMs and use that to feed the expression detection first before detecting more.
Otherwise we get too many inconsistencies.
A recent set of CentOS RPM licenses with detected/declared is attached for info
rpms-licenses.csv.txt
The text was updated successfully, but these errors were encountered: