Releases: nexB/scancode-toolkit
v32.1.0
New CLI options:
- A new CLI option
--package-only
has been added which performs
a faster package scan by skipping the package assembly step and
also skipping license/copyright detection on package metadata.
Major API/other changes:
- Output Format Version updated to 3.1.0 (minor version bump)
- Drops python 3.7 and adopts python 3.12
- New license match attributes:
from_file
matched_text_diagnostics
is added for--license-text-diagnostics
- In codebase-level
license_detections
we have a new attribute
reference_matches
- SPDX license expressions everywhere side-by-side with ScanCode
license expressions. - All rule attribute level data provided in codebase level
todo
items.
Changes in Output Data Structure:
-
The data structure of the JSON output has changed for
licenses at file level, and license detections at top-level.
But note that all the changes are additions to the JSON output,
so we have a minor version bump3.0.0
to3.1.0
:-
There is a new attribute
from_file
inmatches
which is in
license_detections
in:- File level
license_detections
- Codebase level
license_detections
license_detections
andother_license_detections
in
file-levelpackage_data
license_detections
andother_license_detections
in
codebase levelpackages
- File level
-
On using the CLI option
--license-text-diagnostics
there is
now a new license match attributematched_text_diagnostics
with the matched text and highlighted diagnostics, instead of
having this replace the plainmatched_text
. -
A new
reference_matches
attribute is added to codebase-level
license_detections
which is same as thematches
attribute
in other license detections. -
We now have SPDX license expressions everywhere we have
ScanCode license expressions for ease of use and adopting
SPDX everywhere. A new attributelicense_expression_spdx
is added to:license_detections
in file and codebase level- in package
license_detections
andother_license_detections
matches
forlicense_detections
everywhere
-
Adds all rule atrribute level info in codebase level
todo
data, to assist in review. This includes length, text, notes,
referenced_filenames, and the boolean attributes (like
is_license_notice, is_license_intro etc, as applicable).
-
-
New and updated licenses, including support for newly released
SPDX license list versions:-
SPDX License List 3.22:
This release of the SPDX license list had 48 new licenses,
and several of them we already had as licenses/rules, and
these has been modified to be consistent with the SPDX list.
And the rest have been added as new licenses.
For more details see #3554 -
SPDX License List 3.23:
This release of the SPDX license list had 43 new licenses,
and out of them 22 were present as licenses and 10 were
present as rules already. There were 4 new license/exception
texts added, and the rest were either texts with small variations,
additions to texts or several rule texts together.
For more details see #3653 -
We also have lots of other misc new licenses and rules added to
LicenseDB, see PRs below for more details:
#3663
#3642
#3586
#3584
#3575
#3570
#3568
#3562
-
-
Improve debian namespace detection based on clues and fix
namespace and qualifier bugs for debian purls.
For more details see nexB/scancode.io#899
and #3443
Also improve debian manifests parsing and purl parsing from
filenames. Support for nexB/purldb#245
Bumps debian-inspector to v31.1.0 -
Bump commoncode to v31.0.3
-
Upgraded spdx-tools dependency to v0.8.
See #3455
Support for Conan package parser:
- We now support the parsing of Conan manifest files, such as
conanfile.py
, as described here https://docs.conan.io/2.0/reference/conanfile.html.
We also support source extraction fromconandata.yml
, as described here
https://docs.conan.io/2/tutorial/creating_packages/handle_sources_in_packages.html#using-the-conandata-yml-file.
What's Changed
- fix: allow fedora based packages by @philcali in #3479
- Upgrade
spdx-tools
to v0.8.1 #3455 by @armintaenzertng in #3456 - Added docs server script, dark mode & copybutton for docs by @OmkarPh in #3549
- npm: support aliases in yarn lock v1 by @schischi in #3555
- Add license rules by @AyanSinhaMahapatra in #3562
- Fix failing tests by @AyanSinhaMahapatra in #3563
- Add more license rules by @pombredanne in #3567
- Add license detection rules by @AyanSinhaMahapatra in #3568
- More licenses by @AyanSinhaMahapatra in #3570
- Update to spdx 3.22 by @AyanSinhaMahapatra in #3554
- Add new license detection rules by @pombredanne in #3575
- TestRule.test_dump_rule_file: sort the rule file lists. by @licquia in #3582
- Fix reference to install section by @rettichschnidi in #3583
- Add new and updated licenses by @AyanSinhaMahapatra in #3586
- Fix-up an accidental use of SPDX's
WITH
operator in a rule by @fviernau in #3628 - Update home.rst by @machuii in #3627
- Fix SCTK doc build by @AyanSinhaMahapatra in #3636
- Yet more license rules by @AyanSinhaMahapatra in #3584
- Update license detections by @AyanSinhaMahapatra in #3620
- Support conan in packagedcode by @keshav-space in #3650
- Update LicenseDB by @AyanSinhaMahapatra in #3641
- Update debian package manifest parsing by @AyanSinhaMahapatra in #3647
- Fix debian source purl parsing in status by @AyanSinhaMahapatra in #3661
- Support SPDX License List 3.23 by @AyanSinhaMahapatra in #3653
- Add new licenses and license updates by @AyanSinhaMahapatra in #3663
- Update llgpl as a license exception by @AyanSinhaMahapatra in #3680
- Update license rules by @AyanSinhaMahapatra in #3642
- Add Misc updates by @pombredanne in #3662
- Update package handlers by @AyanSinhaMahapatra in #3682
- Support cargo workspaces by @AyanSinhaMahapatra in #3602
- Validate CLI inputs and paths #3596 by @pombredanne in #3609
- Support Python 3.12 by @AyanSinhaMahapatra in #3658
- Add a faster package scan with
--package-only
by @AyanSinhaMahapatra in #3689 - Refine referenced filenames #3547 by @AyanSinhaMahapatra in #3681
- Release prep v32.1.0 by @AyanSinhaMahapatra in #3701
New Contributors
- @philcali made their first contribution in #3479
- @schischi made their first contribution in #3555
- @licquia made their first contribution in #3582
- @rettichschnidi made their first contribution in #3583
- @machuii made their first contribution in #3627
Full Changelog: v32.0.8...v32.1.0
v32.0.8
What's Changed
- Fixed epoch parser failing for numeric values by @OmkarPh in #3520
- Update license rules and detections by @AyanSinhaMahapatra in #3519
- License rules update by @AyanSinhaMahapatra in #3545
- Bump version to v32.0.8 by @AyanSinhaMahapatra in #3548
New Contributors
Full Changelog: v32.0.7...v32.0.8
v32.0.7
This is a minor release with a lot of license detection improvements, with new and updated license detection rules and new licenses.
- 33 new licenses, 30 licenses updated
- 70 new and updated license rules
The main updates over the previous stable release are:
- To the license Rule class adds
is_license_clue
attribute andis_deprecated
attribute to support license clues detection, and always maintain consistency on unique rule names. Adds fixes for other license detection bug related to license clues, bug in setup.cfg license detection and makes license detection identifiers python-safe. See #3462 - Update/Add new licenses and license rules. See #3470 #3513
- Bump commoncode to v31.0.3 fixing a VirtualCodebase creation issue when there is a directory under the root with the same name as the root directory itself. nexB/commoncode#57 #3495
What's Changed
- Edit
check_rdf_scan
so that SPDX rdf tests don't automatically pass #3448 by @armintaenzertng in #3451 - Update misc detections by @AyanSinhaMahapatra in #3462
- Bump commoncode to v31.0.3 by @JonoYang in #3495
- Update and add licenses by @AyanSinhaMahapatra in #3470
- Update licenses and rules by @AyanSinhaMahapatra in #3513
- Release prep 32.0.7 by @AyanSinhaMahapatra in #3527
New Contributors
- @armintaenzertng made their first contribution in #3451
Full Changelog: v32.0.6...v32.0.7
v32.0.6
This is a minor release with a lot of license and package detection improvements, specially for maven packages. We also support the SPDX license list 3.21 now. The main updates over the previous stable release are:
- New and updated licenses, including support for newly released SPDX license list version 3.21. For more details see #3437
- Fixes in summary plugin for licenses, and top-level license detections. #3430
- Updated maven license and package detections, with fixes for various maven package manifest parsing, improved top-level package assembly, ecosystem specific package license detection, fixes in --todo plugin, updated license detection rules/heuristics and other misc changes. For more details see: #3447
- Improved Gemfile.lock parsing. For more details see #3444
- Auto-review plugin to get todo items for scan review, with the new --todo CLI option. For more details see: #3353
- Misc. license and copyright detection improvements at #3346
- Other misc. minor bugfixes detailed in all the previous release-candidates.
What's Changed
- Ambiguous Detections ToDo items by @AyanSinhaMahapatra in #3353
- License detection improvements and review by @pombredanne in #3346
- Fix maven pom resource assignment by @AyanSinhaMahapatra in #3427
- Bump version to v32.0.5rc1 by @AyanSinhaMahapatra in #3428
- Bump version to v32.0.5rc2 by @AyanSinhaMahapatra in #3433
- Release prep v32.0.5rc3 by @AyanSinhaMahapatra in #3436
- Update licenses and rules by @AyanSinhaMahapatra in #3437
- Fix licenses data in summary plugin by @AyanSinhaMahapatra in #3430
- Update proprietary-license_553.RULE by @pombredanne in #3441
- support parsing BUNDLED WITH by @akostadinov in #3444
- Update maven detections by @AyanSinhaMahapatra in #3447
- Release prep v32.0.6 by @AyanSinhaMahapatra in #3454
New Contributors
- @akostadinov made their first contribution in #3444
Full Changelog: v32.0.4...v32.0.6
v32.0.5rc3
Merge pull request #3436 from nexB/release-prep-v32.0.5rc3 Release prep v32.0.5rc3
v32.0.4
This is a minor bugfix release with the following updates:
- Fixes a performance issue issue arising out of license detection
on files happening in a single-threaded process_codebase step when the
license CLI option is disabled for a package scan.
Reference: #3423
What's Changed
- Fix package scan only performance by @AyanSinhaMahapatra in #3423
Full Changelog: v32.0.3...v32.0.4
v32.0.3
This is a minor bugfix release with the following updates:
-
We were missing scancode-toolkit-mini releases from v32.0.0rc2 and
also the scancode-toolkit release wheels including and after v32.0.0rc2 were
actually scancode-toolkit-mini releases.
Reference: #3421 -
Updated github actions, for more details see nexB/skeleton#75
What's Changed
- Fix scancode-toolkit-mini and release prep v32.0.3 #3421 by @AyanSinhaMahapatra in #3422
Full Changelog: v32.0.2...v32.0.3
v32.0.2
This is a minor license update release with:
- new and updated licenses in LicenseDB
- license-expression V30.1.1 with support for the new licenses
What's Changed
- Add new licenses to licenseDB by @AyanSinhaMahapatra in #3414
- Release Prep v32.0.2 by @AyanSinhaMahapatra in #3415
- Add doc redirects by @AyanSinhaMahapatra in #3413
Full Changelog: v32.0.1...v32.0.2
v32.0.1
This is a minor bugfix release.
There are fixes for two issues in this release:
- #3407
here in typecode we had an improper import of ctypes.utils
and this is fixed in a new release v30.0.1 of typecode - #3408
the setup.cfg and setup-mini.cfg was not aligned for plugin
entrypoints.
What's Changed
- Release prep v32.0.1 by @AyanSinhaMahapatra in #3410
Full Changelog: v32.0.0...v32.0.1
v32.0.0
v32 of ScanCode is all about improved license detections!
We have more licenses and rules, and major updates on post-processing matches to license detections.
We also have major improvements in package license detections and unknown references, along with top level detection
summaries for licenses, and reference data for the licenses detected too. There are also a couple of API changes due to
model changes in license data.
See also https://github.com/nexB/scancode.io/ for a complete, customizable SCA solution using ScanCode and
https://github.com/nexB/scancode-workbench/releases for visualizing data generated by ScanCode Toolkit.
Important API changes:
This is a major release with major API and output format changes and significant
feature updates.
In particular the output format has changed for the licenses and packages, and
also for some of the command line options.
The output format version is now 3.0.0.
See https://github.com/nexB/scancode-toolkit/milestone/15 for more details on this release.
Visit https://github.com/nexB/scancode-toolkit/discussions/3406 to discuss about this release.
Package detection:
-
Update
GemfileLockParser
to track the gem which the Gemfile.lock is for,
which we assign to the newGemfileLockParser.primary_gem
field. Update
GemfileLockHandler.parse()
to handle the case where there is a primary gem
detected from a gemfile.lock. If there is a primary gem, a singlePackage
is created and the detected gem data within the gemfile.lock are assigned as
dependencies. If there is no primary gem, then all of the dependencies are
collected into Package with no name and yielded. -
Fix issue where dependencies were not reported when scanning an extracted
Python project by modifyingBaseExtractedPythonLayout.assemble()
to favor
using package data from a PKG-INFO file from an egg-info directory. Package
data from a PKG-INFO file from an egg-info directory contains the dependency
information collected from the requirements.txt file along side PKG-INFO. -
Fix issue where we were returning incorrect purl package
type
for cocoapods.
pods
was being returned as a purl type for cocoapods, it should be
cocoapods
instead.
https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#cocoapods -
Code for parsing a Maven POM, npm package.json, freebsd manifest and haxelib
JSON have been separated into two functions: one that creates a PackageData
object from the parsed Resource, and another that calls the previous function
and yields the PackageData. This was done such that we can use the package
manifest data parsing code outside of the scancode-toolkit context in other
libraries. -
The PackageData model now includes a
holder
field, which is populated with
holder data extracted from the copyright field if copyright data is present,
otherwise it remains empty. -
DatafileHandlers now have a classmethod named
get_top_level_resources()
,
which is supposed to yield the top-level Resources of a Package codebase,
relative to a Package manifest file.maven.MavenPomXmlHandler
is the first
DatafileHandler that has this method implemented.
License detection:
-
The SPDX license list has been updated to the latest v3.20
-
This is a major update to license detection where we now combine one or more
license matches in a larger license detection. This approach improves the
accuracy of license detection and removes a larger number of false positive
or ambiguous license detections. See for details
#2878 -
There is a new
license_detections
codebase level attribute with all the
unique license detections in the whole scan, both in resources and packages.
This has the 3 attributes also present in package/resource level license
detections:license_expression
,identifier
anddetection_log
(present optionally if the--license-diagnostics
option is enabled) with
an additional attribute:count
: Number of times in the codebase this unique license detection
was encountered.
-
The data structure of the JSON output has changed for licenses at file level:
-
The
licenses
attribute is deleted. -
A new
license_detections
attribute contains license detections in that file.
This object has three attributes:license_expression
,identifier
andmatches
.matches
is a list of license matches and is roughly
the same aslicenses
in the previous version with additional structure
changes detailed below. Identifier is the detected license-expression with an
UUID generated from the content ofmatches
such that this is unique for
unique detections. We also have another attributedetection_log
with
diagnostics information if the--license-diagnostics
option is enabled. -
A new attribute
license_clues
contains license matches with the
same data structure as thematches
attribute inlicense_detections
.
This contains license matches that are mere clues and where not considered
to be a proper conclusive license detection. -
The
license_expressions
list of license expressions is deleted and
replaced by adetected_license_expression
single expression.
Similarlyspdx_license_expressions
was removed and replaced by
detected_license_expression_spdx
. -
See
license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-resource>
_
for examples and details.
-
-
The data structure of license attributes in
package_data
and the codebase
levelpackages
has been updated accordingly:-
There is a new
license_detections
attribute for the primary, top-level
declared licenses of a package and another_license_detections
attribute
for the other secondary detections. -
The
license_expression
is replaced by thedeclared_license_expression
andother_license_expression
attributes with their SPDX counterparts
declared_license_expression_spdx
andother_license_expression_spdx
.
These expressions are parallel to detections. -
The
declared_license
attribute is renamedextracted_license_statement
and is now a YAML-encoded string, which can be parsed to recreate the
original extracted license statement. Previously this used to be nested
python objects lists/dicts/string, but now this is always a YAML string.See
license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-package>
_
for examples and details.
-
-
The license matches structure has changed: we used to report one match for each
licensekey
of a matched license expression. We now report instead one
single match for each matched license expression, and list the license keys
as alicenses
attribute. This avoids data duplication.
Inside each match, we list each match and matched rule attributred directly
avoiding nesting. Seelicense updates doc <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#licensematch-result-data>
_
for examples and details. -
There are new and codebase level attributes with
--license-references
to report
reference license metadata and texts once for each license matched across the
scan; we now have two codebase level attributes:license_references
and
license_rule_references
that list unique detected license and license rules.
for examples and details. This reference data is also removed from license matches
in all levels i.e. from codebase, package and resource level license detections and
resource level license clues, irrespective of this CLI option being used, i.e. default
with--licenses
.
Seelicense updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#comparision-before-after-license-references>
_ -
We replaced the
scancode --reindex-licenses
command line option with a
new separate command namedscancode-reindex-licenses
.-
The
--reindex-licenses-for-all-languages
CLI option is also moved to
thescancode-reindex-licenses
command as an option--all-languages
. -
We can now detect licenses using custom license texts and license rules
stored in a directory or packaged as a plugin for consistent reuse and deployment. -
There is an
--additional-directory
option with thescancode-reindex-licenses
command to add the licenses from a directory. -
There is also a
--only-builtin
option to use ony builtin licenses
ignoring any additional license plugins. -
See #480 for more details.
-
-
We combined the license data file and text file of each license in a single
file with a .LICENSE extension. The .yml data file is now included at the
top of each .LICENSE file as "YAML frontmatter". The same applies to license
rules and their .RULE and .yml files. This halves the number of data files
from about 60,000 to 30,000. Git line history is preserved for the combined
text + yml files.- See #3049
-
There is a new console script
scancode-license-data
to export
license data in JSON, YAML and HTML, with indexes and a static website for use
in t...