Skip to content

Releases: nexB/scancode-toolkit

v32.1.0

22 Mar 18:47
cafcbcf
Compare
Choose a tag to compare

New CLI options:

  • A new CLI option --package-only has been added which performs
    a faster package scan by skipping the package assembly step and
    also skipping license/copyright detection on package metadata.

Major API/other changes:

  • Output Format Version updated to 3.1.0 (minor version bump)
  • Drops python 3.7 and adopts python 3.12
  • New license match attributes:
    • from_file
    • matched_text_diagnostics is added for --license-text-diagnostics
  • In codebase-level license_detections we have a new attribute
    reference_matches
  • SPDX license expressions everywhere side-by-side with ScanCode
    license expressions.
  • All rule attribute level data provided in codebase level todo items.

Changes in Output Data Structure:

  • The data structure of the JSON output has changed for
    licenses at file level, and license detections at top-level.
    But note that all the changes are additions to the JSON output,
    so we have a minor version bump 3.0.0 to 3.1.0:

    • There is a new attribute from_file in matches which is in
      license_detections in:

      • File level license_detections
      • Codebase level license_detections
      • license_detections and other_license_detections in
        file-level package_data
      • license_detections and other_license_detections in
        codebase level packages
    • On using the CLI option --license-text-diagnostics there is
      now a new license match attribute matched_text_diagnostics
      with the matched text and highlighted diagnostics, instead of
      having this replace the plain matched_text.

    • A new reference_matches attribute is added to codebase-level
      license_detections which is same as the matches attribute
      in other license detections.

    • We now have SPDX license expressions everywhere we have
      ScanCode license expressions for ease of use and adopting
      SPDX everywhere. A new attribute license_expression_spdx
      is added to:

      • license_detections in file and codebase level
      • in package license_detections and other_license_detections
      • matches for license_detections everywhere
    • Adds all rule atrribute level info in codebase level todo
      data, to assist in review. This includes length, text, notes,
      referenced_filenames, and the boolean attributes (like
      is_license_notice, is_license_intro etc, as applicable).

  • New and updated licenses, including support for newly released
    SPDX license list versions:

    • SPDX License List 3.22:
      This release of the SPDX license list had 48 new licenses,
      and several of them we already had as licenses/rules, and
      these has been modified to be consistent with the SPDX list.
      And the rest have been added as new licenses.
      For more details see #3554

    • SPDX License List 3.23:
      This release of the SPDX license list had 43 new licenses,
      and out of them 22 were present as licenses and 10 were
      present as rules already. There were 4 new license/exception
      texts added, and the rest were either texts with small variations,
      additions to texts or several rule texts together.
      For more details see #3653

    • We also have lots of other misc new licenses and rules added to
      LicenseDB, see PRs below for more details:
      #3663
      #3642
      #3586
      #3584
      #3575
      #3570
      #3568
      #3562

  • Improve debian namespace detection based on clues and fix
    namespace and qualifier bugs for debian purls.
    For more details see nexB/scancode.io#899
    and #3443
    Also improve debian manifests parsing and purl parsing from
    filenames. Support for nexB/purldb#245
    Bumps debian-inspector to v31.1.0

  • Bump commoncode to v31.0.3

  • Upgraded spdx-tools dependency to v0.8.
    See #3455

Support for Conan package parser:

What's Changed

New Contributors

Full Changelog: v32.0.8...v32.1.0

v32.0.8

16 Oct 19:38
26ace52
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v32.0.7...v32.0.8

v32.0.7

28 Sep 12:41
c157593
Compare
Choose a tag to compare

v32.0.6 - 2023-07-13

This is a minor release with a lot of license detection improvements, with new and updated license detection rules and new licenses.

  • 33 new licenses, 30 licenses updated
  • 70 new and updated license rules

The main updates over the previous stable release are:

  • To the license Rule class adds is_license_clue attribute and is_deprecated attribute to support license clues detection, and always maintain consistency on unique rule names. Adds fixes for other license detection bug related to license clues, bug in setup.cfg license detection and makes license detection identifiers python-safe. See #3462
  • Update/Add new licenses and license rules. See #3470 #3513
  • Bump commoncode to v31.0.3 fixing a VirtualCodebase creation issue when there is a directory under the root with the same name as the root directory itself. nexB/commoncode#57 #3495

What's Changed

New Contributors

Full Changelog: v32.0.6...v32.0.7

v32.0.6

19 Jul 14:54
2c46c57
Compare
Choose a tag to compare

This is a minor release with a lot of license and package detection improvements, specially for maven packages. We also support the SPDX license list 3.21 now. The main updates over the previous stable release are:

  • New and updated licenses, including support for newly released SPDX license list version 3.21. For more details see #3437
  • Fixes in summary plugin for licenses, and top-level license detections. #3430
  • Updated maven license and package detections, with fixes for various maven package manifest parsing, improved top-level package assembly, ecosystem specific package license detection, fixes in --todo plugin, updated license detection rules/heuristics and other misc changes. For more details see: #3447
  • Improved Gemfile.lock parsing. For more details see #3444
  • Auto-review plugin to get todo items for scan review, with the new --todo CLI option. For more details see: #3353
  • Misc. license and copyright detection improvements at #3346
  • Other misc. minor bugfixes detailed in all the previous release-candidates.

What's Changed

New Contributors

Full Changelog: v32.0.4...v32.0.6

v32.0.5rc3

24 Jun 14:22
2da060a
Compare
Choose a tag to compare
v32.0.5rc3 Pre-release
Pre-release
Merge pull request #3436 from nexB/release-prep-v32.0.5rc3

Release prep v32.0.5rc3

v32.0.4

07 Jun 20:29
94d4fe6
Compare
Choose a tag to compare

This is a minor bugfix release with the following updates:

  • Fixes a performance issue issue arising out of license detection
    on files happening in a single-threaded process_codebase step when the
    license CLI option is disabled for a package scan.
    Reference: #3423

What's Changed

Full Changelog: v32.0.3...v32.0.4

v32.0.3

06 Jun 19:46
3282bc0
Compare
Choose a tag to compare

This is a minor bugfix release with the following updates:

  • We were missing scancode-toolkit-mini releases from v32.0.0rc2 and
    also the scancode-toolkit release wheels including and after v32.0.0rc2 were
    actually scancode-toolkit-mini releases.
    Reference: #3421

  • Updated github actions, for more details see nexB/skeleton#75

What's Changed

Full Changelog: v32.0.2...v32.0.3

v32.0.2

29 May 13:58
4ec7a92
Compare
Choose a tag to compare

This is a minor license update release with:

  • new and updated licenses in LicenseDB
  • license-expression V30.1.1 with support for the new licenses

What's Changed

Full Changelog: v32.0.1...v32.0.2

v32.0.1

23 May 20:59
e309963
Compare
Choose a tag to compare

This is a minor bugfix release.

There are fixes for two issues in this release:

  • #3407
    here in typecode we had an improper import of ctypes.utils
    and this is fixed in a new release v30.0.1 of typecode
  • #3408
    the setup.cfg and setup-mini.cfg was not aligned for plugin
    entrypoints.

What's Changed

Full Changelog: v32.0.0...v32.0.1

v32.0.0

22 May 22:04
f3086c5
Compare
Choose a tag to compare

v32 of ScanCode is all about improved license detections!

We have more licenses and rules, and major updates on post-processing matches to license detections.
We also have major improvements in package license detections and unknown references, along with top level detection
summaries for licenses, and reference data for the licenses detected too. There are also a couple of API changes due to
model changes in license data.

See also https://github.com/nexB/scancode.io/ for a complete, customizable SCA solution using ScanCode and
https://github.com/nexB/scancode-workbench/releases for visualizing data generated by ScanCode Toolkit.

Important API changes:

This is a major release with major API and output format changes and significant
feature updates.

In particular the output format has changed for the licenses and packages, and
also for some of the command line options.

The output format version is now 3.0.0.

See https://github.com/nexB/scancode-toolkit/milestone/15 for more details on this release.
Visit https://github.com/nexB/scancode-toolkit/discussions/3406 to discuss about this release.

Package detection:

  • Update GemfileLockParser to track the gem which the Gemfile.lock is for,
    which we assign to the new GemfileLockParser.primary_gem field. Update
    GemfileLockHandler.parse() to handle the case where there is a primary gem
    detected from a gemfile.lock. If there is a primary gem, a single Package
    is created and the detected gem data within the gemfile.lock are assigned as
    dependencies. If there is no primary gem, then all of the dependencies are
    collected into Package with no name and yielded.

    #3072

  • Fix issue where dependencies were not reported when scanning an extracted
    Python project by modifying BaseExtractedPythonLayout.assemble() to favor
    using package data from a PKG-INFO file from an egg-info directory. Package
    data from a PKG-INFO file from an egg-info directory contains the dependency
    information collected from the requirements.txt file along side PKG-INFO.

    #3083

  • Fix issue where we were returning incorrect purl package type for cocoapods.
    pods was being returned as a purl type for cocoapods, it should be
    cocoapods instead.
    https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#cocoapods

    #3081

  • Code for parsing a Maven POM, npm package.json, freebsd manifest and haxelib
    JSON have been separated into two functions: one that creates a PackageData
    object from the parsed Resource, and another that calls the previous function
    and yields the PackageData. This was done such that we can use the package
    manifest data parsing code outside of the scancode-toolkit context in other
    libraries.

  • The PackageData model now includes a holder field, which is populated with
    holder data extracted from the copyright field if copyright data is present,
    otherwise it remains empty.

    #3290

  • DatafileHandlers now have a classmethod named get_top_level_resources(),
    which is supposed to yield the top-level Resources of a Package codebase,
    relative to a Package manifest file. maven.MavenPomXmlHandler is the first
    DatafileHandler that has this method implemented.

License detection:

  • The SPDX license list has been updated to the latest v3.20

  • This is a major update to license detection where we now combine one or more
    license matches in a larger license detection. This approach improves the
    accuracy of license detection and removes a larger number of false positive
    or ambiguous license detections. See for details
    #2878

  • There is a new license_detections codebase level attribute with all the
    unique license detections in the whole scan, both in resources and packages.
    This has the 3 attributes also present in package/resource level license
    detections: license_expression, identifier and detection_log
    (present optionally if the --license-diagnostics option is enabled) with
    an additional attribute:

    • count: Number of times in the codebase this unique license detection
      was encountered.
  • The data structure of the JSON output has changed for licenses at file level:

    • The licenses attribute is deleted.

    • A new license_detections attribute contains license detections in that file.
      This object has three attributes: license_expression, identifier
      and matches. matches is a list of license matches and is roughly
      the same as licenses in the previous version with additional structure
      changes detailed below. Identifier is the detected license-expression with an
      UUID generated from the content of matches such that this is unique for
      unique detections. We also have another attribute detection_log with
      diagnostics information if the --license-diagnostics option is enabled.

    • A new attribute license_clues contains license matches with the
      same data structure as the matches attribute in license_detections.
      This contains license matches that are mere clues and where not considered
      to be a proper conclusive license detection.

    • The license_expressions list of license expressions is deleted and
      replaced by a detected_license_expression single expression.
      Similarly spdx_license_expressions was removed and replaced by
      detected_license_expression_spdx.

    • See license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-resource>_
      for examples and details.

  • The data structure of license attributes in package_data and the codebase
    level packages has been updated accordingly:

    • There is a new license_detections attribute for the primary, top-level
      declared licenses of a package and an other_license_detections attribute
      for the other secondary detections.

    • The license_expression is replaced by the declared_license_expression
      and other_license_expression attributes with their SPDX counterparts
      declared_license_expression_spdx and other_license_expression_spdx.
      These expressions are parallel to detections.

    • The declared_license attribute is renamed extracted_license_statement
      and is now a YAML-encoded string, which can be parsed to recreate the
      original extracted license statement. Previously this used to be nested
      python objects lists/dicts/string, but now this is always a YAML string.

      See license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-package>_
      for examples and details.

  • The license matches structure has changed: we used to report one match for each
    license key of a matched license expression. We now report instead one
    single match for each matched license expression, and list the license keys
    as a licenses attribute. This avoids data duplication.
    Inside each match, we list each match and matched rule attributred directly
    avoiding nesting. See license updates doc <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#licensematch-result-data>_
    for examples and details.

  • There are new and codebase level attributes with --license-references to report
    reference license metadata and texts once for each license matched across the
    scan; we now have two codebase level attributes: license_references and
    license_rule_references that list unique detected license and license rules.
    for examples and details. This reference data is also removed from license matches
    in all levels i.e. from codebase, package and resource level license detections and
    resource level license clues, irrespective of this CLI option being used, i.e. default
    with --licenses.
    See license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#comparision-before-after-license-references>_

  • We replaced the scancode --reindex-licenses command line option with a
    new separate command named scancode-reindex-licenses.

    • The --reindex-licenses-for-all-languages CLI option is also moved to
      the scancode-reindex-licenses command as an option --all-languages.

    • We can now detect licenses using custom license texts and license rules
      stored in a directory or packaged as a plugin for consistent reuse and deployment.

    • There is an --additional-directory option with the scancode-reindex-licenses
      command to add the licenses from a directory.

    • There is also a --only-builtin option to use ony builtin licenses
      ignoring any additional license plugins.

    • See #480 for more details.

  • We combined the license data file and text file of each license in a single
    file with a .LICENSE extension. The .yml data file is now included at the
    top of each .LICENSE file as "YAML frontmatter". The same applies to license
    rules and their .RULE and .yml files. This halves the number of data files
    from about 60,000 to 30,000. Git line history is preserved for the combined
    text + yml files.

  • There is a new console script scancode-license-data to export
    license data in JSON, YAML and HTML, with indexes and a static website for use
    in t...

Read more