Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add complete License-Text to cyclonedx bom #570

Open
andife opened this issue Aug 28, 2023 · 9 comments · May be fixed by #674
Open

feat: Add complete License-Text to cyclonedx bom #570

andife opened this issue Aug 28, 2023 · 9 comments · May be fixed by #674
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@andife
Copy link
Contributor

andife commented Aug 28, 2023

We have as a requirement in the CycloneDX json to include the complete license information/text provided.

My idea would have been to take this directly from the wheel. This contains this information in the *dist-info directory. There is a LICENSE file.

This information can also be accessed via "pip-licenses --with-license-file --format=json".

It would be nice if the designated area could be filled in the cyclonedx format (https://cyclonedx.org/docs/1.4/json/#components_items_licenses_items_license_text_content) for the license file.

@jkowalleck jkowalleck added the enhancement New feature or request label Oct 24, 2023
@nejch
Copy link

nejch commented Nov 27, 2023

I've implemented something like this, although I store the license texts in ComponentEvidence since it's more a result of analysis rather than a guaranteed license. Some packages contain multiple license files, for example. I'll try to upstream this here at some point if it's accepted.

I used pip-licenses as well in the past but moved to this approach for exactly the same reason as stated above :)

@jkowalleck
Copy link
Member

@nejch better wait with an implementation until the following were properly merged to master:

@jkowalleck jkowalleck changed the title Feature Request: Add complete License-Text to cyclonedx bom feat: Add complete License-Text to cyclonedx bom Jan 6, 2024
@jkowalleck
Copy link
Member

see #567

@jkowalleck jkowalleck added the help wanted Extra attention is needed label Jan 6, 2024
@jkowalleck
Copy link
Member

jkowalleck commented Feb 2, 2024

since v4 was published and released, feel free to contribute this feature.

as explained, the target of the "detected" license texts shall be component.evidence.licenses[] (https://cyclonedx.org/docs/1.5/json/#metadata_component_evidence_licenses)

example outcome:

{
  // ...
  "evidence": {
    "licenses": [
      {
        "name": "detected license text from file XZY",
        "text": {
           "contentType": "text/markdown",
           "encoding": "base64",
           "content": "IyBNSVQgTm8gQXR0cmlidXRpb24KCkNvcHlyaWdodCAyMDI0IEphbmUgRG9lCgpQZXJtaXNzaW9uIGlzIGhlcmVieSBncmFudGVkLCBbLi4uXQ=="
        }
      }
    ]
  } 
}

@jkowalleck jkowalleck added the good first issue Good for newcomers label Feb 2, 2024
@nejch
Copy link

nejch commented Feb 2, 2024

Thanks a lot, that's almost exactly what I have now although so far I didn't encode it:

            "evidence": {
                "licenses": [
                    {
                        "license": {
                            "name": "mkdocs-1.5.3.dist-info/licenses/LICENSE",
                            "text": {
                                "contentType": "text/plain",
                                "content": "Copyright \u00a9 2014-present, Tom Christie. All rights reserved.\n\nRedistribution and use in source and binary forms, with or\nwithout modification, are permitted provided that the following\nconditions are met:\n\nRedistributions of source code must retain the above copyright\nnotice, this list of conditions and the following disclaimer.\nRedistributions in binary form must reproduce the above copyright\nnotice, this list of conditions and the following disclaimer in\nthe documentation and/or other materials provided with the\ndistribution.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND\nCONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES,\nINCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF\nMERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE\nDISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR\nCONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,\nSPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT\nLIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF\nUSE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED\nAND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT\nLIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN\nANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE\nPOSSIBILITY OF SUCH DAMAGE.\n"
                            }
                        }
                    }
                ]
            }

@jkowalleck just wondering, does the current implementation have a mechanism to try and guess the license (spdx or generic name) from a text? I'd then potentially try to reuse it, might be useful to add in the name, even if not really guaranteed.

@nejch
Copy link

nejch commented Feb 2, 2024

Also note to self: Just noticed this package no longer has a public API. So I guess this functionality should actually go into https://github.com/CycloneDX/cyclonedx-python-lib, if we as users want to use it programmatically.

Edit: maybe not, as that only has the models etc 😅 hmm.

@jkowalleck
Copy link
Member

jkowalleck commented Feb 2, 2024

re #570 (comment)
@nejch

[...] although so far I didn't encode it

encoding the license text is not an option, it is mandatory, AFAIK.
you need to should get your implementation fixed.

PS: Edit: need to dig into the SBOM guide and check it it is actually required. At least it is what I thought, and I still encourage encoding the content, to prevent issues when embedding the text in the transport media(XML/JSON/ProtoBuff)

[...] a mechanism to try and guess the license (spdx or generic name) from a text?

Nope, not exactly. Guessing license identifiers based on text snippets is nothing that is planned for any python implementation. (This would bloat the library or depend on external services... However, there are tools that can do this already. e.g. https://github.com/CycloneDX/license-scanner)
A thing that exists is detecting of license names -- see the library here: cyclonedx.factory.license.LicenseFactory.make_from_string() -- https://cyclonedx-python-library.readthedocs.io/en/latest/autoapi/cyclonedx/factory/license/index.html#cyclonedx.factory.license.LicenseFactory.make_from_string

@nejch
Copy link

nejch commented Feb 2, 2024

@jkowalleck sure, this was mostly for internal use to display the SBOM angular-style, will ensure it's encoded before going upstream!

Thanks for the hint there, since some license contents start with the name itself as the first line, I'll see if that could be of some use but not 100%.

@jkowalleck
Copy link
Member

jkowalleck commented Mar 13, 2024

Acceptance criteria

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
3 participants