Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow scan storage providers to express that they only work for packages (not projects) #6460

Open
schvvarzekatze opened this issue Feb 8, 2023 · 11 comments
Labels
enhancement Issues that are considered to be enhancements scanner About the scanner tool

Comments

@schvvarzekatze
Copy link

I just retried to use ClearlyDefined to curate missing copyrights. This worked perfectly for all gradle packages, but still not for yarn.

I used the config as described in the readme:

ort:
  scanner:
    storages:
      clearlyDefined:
        serverUrl: "https://api.clearlydefined.io"

    storageReaders: ["clearlyDefined"]

I only found this warning in the logs:

10:54:09.568 [main] WARN  org.ossreviewtoolkit.scanner.storages.ClearlyDefinedStorage - Could not obtain ClearlyDefined coordinates for package 'Yarn::package.json:'.
10:54:09.572 [main] INFO  org.ossreviewtoolkit.scanner.ScanResultsStorage - Read 0 scan result(s) for 'Yarn::package.json:' from ClearlyDefinedStorage in 12.518556ms.

It seems that clearly defined is not applied on the npm Libraries behind package.json of the project node.

Can this be curated by any other ORT config than this mentioned above?

Thank you very much.

@sschuberth
Copy link
Member

This worked perfectly for all gradle packages, but still not for yarn.

Note that there is no such thing as "Yarn packages"; Yarn is an alternative package manager for NPM packages.

Could not obtain ClearlyDefined coordinates for package 'Yarn::package.json:'.

That's more or less expected. What happens here is that ORT looks for stored scan results at ClearlyDefined for the Yarn project you're analyzing / scanning. But as ClearlyDefined only has results for packages that have already been published, it cannot have results for the source code of your unpublished project.

For Yarn projects, the ORT id's type is "Yarn". But as soon as you would publish the package for the project to an NPM registry, ORT would consider that package id's type to be "NPM".

@sschuberth
Copy link
Member

What happens here is that ORT looks for stored scan results at ClearlyDefined for the Yarn project you're analyzing / scanning.

@mnonnenmacher do you have a good idea how we could limit certain scan storage implementations to only package entities?

@sschuberth sschuberth added the scanner About the scanner tool label Feb 8, 2023
@schvvarzekatze
Copy link
Author

For Yarn projects, the ORT id's type is "Yarn". But as soon as you would publish the package for the project to an NPM registry, ORT would consider that package id's type to be "NPM".

Thank you very much for your explanation. So it would only help for the cases with missing copyrights to add these via curations, as this feature branch would suggest.

@mnonnenmacher
Copy link
Member

do you have a good idea how we could limit certain scan storage implementations to only package entities?

@sschuberth I think best would be if the storage implementations had a property for that, then the scanner could take this into account when fetching scan results.

@schvvarzekatze Please also note that you have configured ClearlyDefined as a scan results storage above, to get curations from ClearlyDefined you need to configure it as a package curation provider, see:

# Providers are listed from highest to lower priority. Technically, they are applied in reverse order: The provider
# with the highest priority is applied last, so it can overwrite any previously applied curations.
packageCurationProviders:
- type: DefaultFile
- type: DefaultDir
- type: File
id: SomeCurationsFile
config:
path: '/some-path/curations.yml'
mustExist: true
- type: File
id: SomeCurationsDir
config:
path: '/some-path/curations-dir'
mustExist: false
- type: OrtConfig
enabled: '${USE_ORT_CONFIG_CURATIONS:-true}'
- type: ClearlyDefined
config:
serverUrl: 'https://api.clearlydefined.io'
minTotalLicenseScore: 80
- type: SW360
config:
restUrl: 'https://your-sw360-rest-url'
authUrl: 'https://your-authentication-url'
username: username
password: password
clientId: clientId
clientPassword: clientPassword
token: token

@sschuberth
Copy link
Member

I think best would be if the storage implementations had a property for that, then the scanner could take this into account when fetching scan results.

You mean like simply also passing the ScanContext to readStoredResults or so?

@sschuberth sschuberth changed the title Could not obtain ClearlyDefined coordinates for package 'Yarn::package.json:'. - ClearlyDefined not applicable for yarn? Allow scan storage providers to express that they only work for packages (not projects) Apr 27, 2023
@sschuberth sschuberth added the enhancement Issues that are considered to be enhancements label Apr 27, 2023
@sschuberth
Copy link
Member

I think best would be if the storage implementations had a property for that, then the scanner could take this into account when fetching scan results.

You mean like simply also passing the ScanContext to readStoredResults or so?

Ping @mnonnenmacher.

@sschuberth
Copy link
Member

This should probably be implemented as part of #6603.

@mnonnenmacher
Copy link
Member

I think best would be if the storage implementations had a property for that, then the scanner could take this into account when fetching scan results.

You mean like simply also passing the ScanContext to readStoredResults or so?

Yes, for example.

This should probably be implemented as part of #6603.

To me, these are two independent tasks. I also wonder if we still need this ticket as the description is mixing curation providers with scan storages.

@sschuberth
Copy link
Member

I also wonder if we still need this ticket as the description is mixing curation providers with scan storages.

Is it? I don't see how. Anyway, I'd like to keep this open as a reminder that we should have something like "capabilities" for a scan storage provider to express whether it makes sense to query it for project scan results in the first place.

@mnonnenmacher
Copy link
Member

I also wonder if we still need this ticket as the description is mixing curation providers with scan storages.

Is it? I don't see how.

The description starts with "I just retried to use ClearlyDefined to curate missing copyrights".

@sschuberth
Copy link
Member

The description starts with "I just retried to use ClearlyDefined to curate missing copyrights".

Ah, right. The quoted log though comes from scan storages. I believe there simply was no difference made by the OP between the ClearlyDefined curation and ClearlyDefined scan storages, assuming that enabling the one would also enable the other.

Anyway, probably no public scan storage would ever contain results for projects, but only for packages, so querying these for projects could be avoided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issues that are considered to be enhancements scanner About the scanner tool
Projects
None yet
Development

No branches or pull requests

3 participants