Skip to content

Releases: nexB/scancode-toolkit

v31.1.1

02 Sep 13:15
Compare
Choose a tag to compare

This is a minor release with a bug fix.

  • Do not display tracing/debug outputs at runtime reported by @soimkim

Full Changelog: v31.1.0...v31.1.1

v31.1.0

29 Aug 13:20
Compare
Choose a tag to compare

v31.1.0 - 2022-08-29

This is a minor release with critical bug fixes and minor updates.

  • Fix a critical bug in license detection

What's Changed

Full Changelog: v31.0.2...v31.1.0

v31.0.2

25 Aug 20:40
Compare
Choose a tag to compare

This is minor release with minor bug fixes and feature improvements.

What's Changed

Full Changelog: v31.0.1...v31.0.2

v31.0.1

18 Aug 06:36
Compare
Choose a tag to compare

This is a major release with important bug and security fixes, new and improved
features and API changes.

Note that we no longer support Python 3.6. Use Python 3.7+ instead.

Important API changes:

  • The data structure of the JSON output has changed for copyrights, authors
    and holders. We now use a proper name for attributes and not a generic "value".

  • The data structure of the JSON output has changed for packages. We now
    return "package_data" package information at the manifest file-level
    rather than "packages". This has all the data attributes of a "package_data"
    field plus others: "package_uuid", "package_data_files" and "files".

    • There is a a new top-level "packages" attribute that contains package
      instances that can be aggregating data from multiple manifests.

    • There is a a new top-level "dependencies" attribute that contains each
      dependency instance, these can be standalone or releated to a package.
      These contain a new "extra_data" object.

    • There is a new resource-level attribute "for_packages" which refers to
      packages through package_uuids (pURL + uuid string).

  • The data structure for HTML output has been changed to include emails and
    urls under the "infos" object. The HTML template displays output for holders,
    authors, emails, and urls into separate tables like "licenses" and "copyrights".

  • The data structure for CSV output has been changed to rename the Resource
    column to "path". "copyright_holder" has been renamed to "holder". The CSV
    output is deprecated and will be replaced in the future by an improved tabular
    format.

  • The license clarity scoring plugin has been overhauled to show new license
    clarity criteria. More details of the new scoring criteria are provided below.

  • The functionality of the summary plugin has been imprived to provide declared
    origin and license information for the codebase being scanned. The previous
    summary plugin functionality has been preserved in the new tallies plugin.
    More details are provided below.

  • ScanCode has adopted the new code skeleton from https://github.com/nexB/skeleton
    The key change is the location of the virtual environment. It used to be
    created at the root of the scancode-toolkit directory. It is now created
    under the venv subdirectory. You mus be aware of this if you use ScanCode
    from a git clone

  • DatafileHandler.assemble(), DatafileHandler.assemble_from_many(), and
    the other .assemble() methods from the other Package handlers from
    packagedcode, have been updated to yield Package items before Dependency or
    Resource items. This is particulary important in the case where we are calling
    the assemble() method outside of the scancode-toolkit context, where we
    need to ensure that a Package exists before we assocate a Resource or
    Dependency to it.

Copyright detection:

  • The data structure in the JSON is now using consistently named attributes as
    opposed to plain values.
  • Several copyright detection bugs have been fixed.
  • French and German copyright detection is improved.
  • Some spurious trailing dots in holders are not stripped.

License detection:

  • There have been significant license detection rules and licenses updates:

    • 107 new licenses have been added (total is now 1954)
    • 6780 new license detection rules have been added (total is now 32259)
    • 6753 existing false positive license rules have been removed (see below).
    • The SPDX license list has been updated to the latest v3.17
  • The rule attribute "only_known_words" has been renamed to "is_continuous" and its
    meaning has been updated and expanded. A rule tagged as "is_continuous" can only
    be matched if there are no gaps between matched words, be they stopwords, extra
    unknown or known words. This improves several false positive license detections.
    The processing for "is_continous" has been merged in "key phrases" processing
    below.

  • Key phrases can now be defined in a RULE text by surrounding one or more words
    with double curly braces {{ and }}. When defined a RULE will only match
    when the key phrases match exactly. When all the text of rule is a "key phrase",
    this is the same as being "is_continuous".

  • The "--unknown-licenses" option now also detects unknown licenses using a
    simple and effective ngrams-based matching in area that are not matched or
    weakly matched. This helps detects things that look like a license but are not
    yet known as licenses.

  • False positive detection of "license lists" like the lists seen in license and
    package management tools has been entirely reworked. Rather than using
    thousands of small false positive rules, there is a new filter to detect a
    long run of license references and tags that is typical of license lists.
    As a results, thousands of rules have been replaced by a simpler filter, and
    the license detection is more accurate, faster and has fewer false
    positives.

  • The new license flag "is_generic" tags licenses that are "generic" licenses
    such as "other-permissive" or "other-copyleft". This is not yet
    returned in the JSON API.

  • When scanning binary files, the detection of single word rules is filtered when
    surrounded by gibberish or mixed case. For instance $#%$GpL$ is a false
    positive and is no longer reported.

  • Several rules we tagged as is_license_notice incorrectly but were references
    and have been requalified as is_license_reference. All rules made of a single
    ord have been requalified as is_license_reference if they were not qualified
    this way.

  • Matches to small license rules (with small defined as under 15 words)
    that are scattered over too many lines are now filtered as false matches.

  • Small, two-words matches that overlap the previous or next match by
    by the word "license" and assimilated are now filtered as false matches.

  • The new --licenses-reference option adds a new "licenses_reference" top
    level attribute to a scan when using the JSON and YAML outputs. This contains
    all the details and the full text of every license seen in a file or
    package license expression of a scan. This can be added added after the fact
    using the --from-json option.

  • New experimental support for non-English licenses. Use the command
    ./scancode --reindex-licenses-for-all-languages to index all known non-English
    licenses and rules. From that point on, they will be detected. Because of this
    some licenses that were not tagged with their languages are now correctly
    tagged and they may not be detected unless you activate this new indexing
    feature.

Package detection:

  • Major changes in package detection and reporting, codebase-level attribute packages
    with one or more package_data and files for the packages are reported.
    The specific changes made are:

    • The resource level attribute packages has been renamed to package_data,
      as these are really package data that are being detected, such as manifests,
      lockfiles or other package data. This has the data attributes of a package_data
      field plus others: package_uuid, package_data_files and files.

    • A new top-level attribute packages has been added which contains package
      instances created from package_data detected in the codebase.

    • A new codebase level attribute dependencies has been added which contains dependency
      instances created from lockfiles detected in the codebase.

    • The package attribute root_path has been deleted from package_data in favour
      of the new format where there is no root conceptually, just a list of files for each
      package.

    • There is a new resource-level attribute for_packages which refers to
      packages through package_uids (pURL + uuid string). A package_adder
      function is now used to associate a Package to a Resource that is part of
      it. This gives us the flexibility to use the packagedcode Package handlers
      in other contexts where for_packages on Resource is not implemented in the
      same way as scancode-toolkit.

    • The package_data attribute dependencies (which is a list of DependentPackages),
      now has a new attribute resolved_package with a package data mapping.
      Also the requirement attribute is renamed to extracted_requirement.
      There is a new extra_data to collect extra data as needed.

  • For Pypi packages, python_requires is treated as a package dependency.

License Clarity Scoring Update:

  • We are moving away from the original license clarity scoring designed for
    ClearlyDefined in the license clarity score plugin. The previous license
    clarity scoring logic produced a score that was misleading when it would
    return a low score due to the stringent scoring criteria. We are now using
    more general criteria to get a sense of what provenance information has been
    provided and whether or not there is a conflict in licensing between what
    licenses were declared at the top-level key files and what licenses have been
    detected in the files under the top-level.

  • The license clarity score is a value from 0-100 calculated by combining the
    weighted values determined for each of the scoring elements:

    • Declared license:

      • When true, indicates that the software package licensing is documented at
        top-level or well-known locations in the software project, typically in a
        package manifest, NOTICE, LICENSE, COPYING or README file.
      • Scoring Weight = 40
    • Identification precision:

      • Indicates how well the license statement(s) of the software identify known
        licenses that can be designated by precise keys (identifiers) as provided in
        a...
Read more

v31.0.0rc5

02 Aug 17:21
Compare
Choose a tag to compare

This is one of the last release candidate for the upcoming 31 release.

v31 is a major release with many new features, and several bug fixes and
improvements including major updates to the package and dependency collection and to the license detection.

Several bugs have been fixed when compared with 31.0.0rc3 in particular the ability to properly report licenses in system package scans.

See https://github.com/nexB/scancode-toolkit/blob/v31.0.0rc5/CHANGELOG.rst for an overview of the changes in v31 compared to v30.
Please try this release and report any installation issues so we can work towards a stable 31.
Thank you!

What's Changed since 31 rc3

Full Changelog: v31.0.0rc3...v31.0.0rc5

v31.0.0rc3

28 Jul 15:31
04d67da
Compare
Choose a tag to compare

This is a penultimate release candidate for the upcoming 31 release.

v31 is a major release with many new features, and several bug fixes and
improvements including major updates to the package and dependency collection and to the license detection.

Several bugs have been fixed when compared with 31.0.0rc2.

See https://github.com/nexB/scancode-toolkit/blob/v31.0.0rc3/CHANGELOG.rst for an overview of the changes in v31 compared to v30.
Please try this release and report any installation issues so we can work towards a stable 31.
Thank you!

What's Changed

New Contributors

Full Changelog: v31.0.0rc2...v31.0.0rc3

v31.0.0rc2

16 Jun 19:40
7394e79
Compare
Choose a tag to compare

This is a release candidate for the upcoming 31 release.

v31 is a major release with many new features, and several bug fixes and
improvements including major updates to the package and dependency collection and to the license detection.

Several bugs have been fixed when compared with 31.0.0rc1.

See https://github.com/nexB/scancode-toolkit/blob/v31.0.0rc2/CHANGELOG.rst for an overview of the changes in v31 compared to v30.
Please try this release and report any installation issues so we can work towards a stable 31.
Thank you!

What's Changed

Full Changelog: v31.0.0rc1...v31.0.0rc2

v31.0.0rc1

13 Jun 22:56
Compare
Choose a tag to compare

This is a release candidate for the upcoming 31 release.

v31 is a major release with many new features, and several bug fixes and
improvements including major updates to the package and dependency collection and to the license detection.

Several bugs have been fixed when compared with 31.0.0b5.

See https://github.com/nexB/scancode-toolkit/blob/v31.0.0rc1/CHANGELOG.rst for an overview of the changes in v31 compared to v30.
Please try this release and report any installation issues so we can work towards a stable 31.
Thank you!

What's Changed

New Contributors

Full Changelog: v31.0.0b5...v31.0.0rc1

v31.0.0b5

17 May 23:45
Compare
Choose a tag to compare

This is a beta release for the upcoming 31 release.

v31 is a major release with many new features, and several bug fixes and
improvements including major updates to the package and dependency collection and to the license detection.

Several bugs have been fixed when compared by b4.

See https://github.com/nexB/scancode-toolkit/blob/v31.0.0b5/CHANGELOG.rst for an overview of the changes in v31 compared to v30.
Please try this release and report any installation issues so we can work towards a stable 31.
Thank you!

What's Changed

Full Changelog: v31.0.0b4...v31.0.0b5

v31.0.0b4

10 May 18:46
ca7fb71
Compare
Choose a tag to compare

This is a beta release for the upcoming 31 release.

v31 is a major release with many new features, and several bug fixes and
improvements including major updates to the package and dependency collection and to the license detection.

Several bugs have been fixed when compared by b3.

See https://github.com/nexB/scancode-toolkit/blob/v31.0.0b4/CHANGELOG.rst for an overview of the changes.
Please try this release and report any installation issues so we can work towards a stable 31.
Thank you!

What's Changed

Full Changelog: v31.0.0b3...v31.0.0b4