Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error thrown when Invalid license key character provided #76

Open
rnjudge opened this issue Dec 15, 2022 · 3 comments
Open

Error thrown when Invalid license key character provided #76

rnjudge opened this issue Dec 15, 2022 · 3 comments

Comments

@rnjudge
Copy link

rnjudge commented Dec 15, 2022

Tern uses license-expression to validate SPDX licenses. When an invalid license key is provided (i.e. contains invalid characters like / or ,), license-expression throws an error when it should handle it.

>>> import license_expression
>>> from license_expression import get_spdx_licensing
>>> licensing = get_spdx_licensing()
>>> license_data = "MIT/X11"
>>> licensing.validate(license_data).errors == []

Traceback (most recent call last):
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 777, in validate
    parsed_expression = self.parse(expression, strict=strict)
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 539, in parse
    tokens = list(self.tokenize(
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 603, in tokenize
    for token in tokens:
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 996, in replace_with_subexpression_by_license_symbol
    for token_group in token_groups:
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 935, in build_token_groups_for_with_subexpression
    tokens = list(tokens)
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 597, in <genexpr>
    tokens = (t for t in tokens if t.string and t.string.strip())
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 921, in build_symbols_from_unknown_tokens
    for symtok in build_token_with_symbol():
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 901, in build_token_with_symbol
    toksym = LicenseSymbol(string)
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 1213, in __init__
    raise ExpressionError(
license_expression.ExpressionError: Invalid license key: the valid characters are: letters and numbers, underscore, dot, colon or hyphen signs and spaces: 'MIT/X11'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 780, in validate
    expression_info.invalid_symbols.append(e.token_string)
AttributeError: 'ExpressionError' object has no attribute 'token_string'
>>> license_data = "MIT,X11"
>>> licensing.validate(license_data).errors == []
Traceback (most recent call last):
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 777, in validate
    parsed_expression = self.parse(expression, strict=strict)
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 539, in parse
    tokens = list(self.tokenize(
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 603, in tokenize
    for token in tokens:
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 996, in replace_with_subexpression_by_license_symbol
    for token_group in token_groups:
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 935, in build_token_groups_for_with_subexpression
    tokens = list(tokens)
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 597, in <genexpr>
    tokens = (t for t in tokens if t.string and t.string.strip())
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 921, in build_symbols_from_unknown_tokens
    for symtok in build_token_with_symbol():
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 901, in build_token_with_symbol
    toksym = LicenseSymbol(string)
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 1213, in __init__
    raise ExpressionError(
license_expression.ExpressionError: Invalid license key: the valid characters are: letters and numbers, underscore, dot, colon or hyphen signs and spaces: 'MIT,X11'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 780, in validate
    expression_info.invalid_symbols.append(e.token_string)
AttributeError: 'ExpressionError' object has no attribute 'token_string'

When a valid license key is provided (i.e. no unexpected characters), the library returns as expected:

>>> license_data = "MIT-X11"
>>> licensing.validate(license_data).errors == []
False

I would expect the library to handle unexpected characters and mark expressions with unexpected characters as an invalid license.

rnjudge added a commit to rnjudge/tern that referenced this issue Dec 15, 2022
When a license is reported with invalid license keys (i.e. anything
besides letters and numbers, underscore, dot, colon or hyphen
signs and spaces) the `is_spdx_license_expression()` function fails
because the liense-expression library does not properly handle the
unknown characters. This commit is a workaround until the issue
opened in the license-expression library[1] is resolved.

Resolves tern-tools#1199

[1] aboutcode-org/license-expression#76

Signed-off-by: Rose Judge <[email protected]>
rnjudge added a commit to tern-tools/tern that referenced this issue Dec 15, 2022
When a license is reported with invalid license keys (i.e. anything
besides letters and numbers, underscore, dot, colon or hyphen
signs and spaces) the `is_spdx_license_expression()` function fails
because the liense-expression library does not properly handle the
unknown characters. This commit is a workaround until the issue
opened in the license-expression library[1] is resolved.

Resolves #1199

[1] aboutcode-org/license-expression#76

Signed-off-by: Rose Judge <[email protected]>
@rnjudge
Copy link
Author

rnjudge commented Jan 5, 2023

@pombredanne any thoughts on this?

@pombredanne
Copy link
Member

@rnjudge

"MIT/X11" is not a valid license key: not an SPDX one and it further contains characters typically not supported in the SPDX spec.

There are multiple tokenizers to handle an expression: a simple one or one based on an automaton. This later one accepts arbitrary strings. A simple way to do things is to create multiple aliases for a given license symbol:

>>> symbol = LicenseSymbol(key="MIT", aliases=["MIT/X11", "MIT,X11"])
>>> l = Licensing(symbols=[symbol])
>>> l.parse("MIT/X11", simple=False)
LicenseSymbol('MIT', aliases=('MIT/X11', 'MIT,X11'), is_exception=False)

Here simple=False forces using the advanced automaton-based tokenizer that can recognize most any alias strings even with spaces or not correct syntax-wise.

You would need to know ahead of time all the supported aliases and build you own licensing for this.

Alternatively, if you have a list of these, we could also add these aliases as a standard "key_aliases" in each license record in https://github.com/nexB/scancode-toolkit/blob/cc14890e1bb6264b01ddb96975cac54466bd6a64/src/licensedcode/models.py#L272 and then update the code here to also treat "key_aliases" as LicenseSymbol aliases in https://github.com/nexB/license-expression/blob/15481270d1080d18e94ad5c5e9618f07e07eb933/src/license_expression/__init__.py#L868

Note also that using scancode-toolkit will always be better for this:

>>> from licensedcode.cache import get_index
>>> idx = get_index()
>>> idx.match(query_string="MIT/X11", as_expression=True)
[LicenseMatch: 'mit', lines=(1, 1), matcher='1-hash', rid=mit_366.RULE, sc=99.0, cov=100.0, len=2, hilen=1, rlen=2, qreg=(0, 1), ireg=(0, 1)]

But in practice, each package type/ecosystem will have its specialized ways to provide license information so this approach will onot always work and the packagedcode module handles this for each package manifest and formats already: https://github.com/nexB/scancode-toolkit/search?q=populate_license_fields&type=code

Even the standard code that works mostly across package types does much more than just using the license_expression library: https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/licensing.py

@pombredanne
Copy link
Member

See also #70 by @ivanayov

AyanSinhaMahapatra added a commit that referenced this issue Nov 22, 2023
Added PDF and ePub download option for RTD documentation as requested in aboutcode-org/aboutcode#127
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants