-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(scanner): detection of files without curation #9487
base: main
Are you sure you want to change the base?
feat(scanner): detection of files without curation #9487
Conversation
I believe we should first create a proposal how this could be done, before working on an actual implementation and decide for a specific approach. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #9487 +/- ##
============================================
- Coverage 68.03% 68.02% -0.01%
- Complexity 1287 1288 +1
============================================
Files 249 249
Lines 8826 8830 +4
Branches 920 921 +1
============================================
+ Hits 6005 6007 +2
- Misses 2432 2433 +1
- Partials 389 390 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Question1: Should I create a proposal as a comment inside of #9435? In any case, the proposals that I have thought of are written in this PR:
Question3 Should I try to detail more the proposals written here? (I am new to the project and do not yet understand all possible outcomes/impact from the approaches outlined in this PR) |
I'd still have a preference for not including unlicensed files / files without findings to the scan results to keep it smaller. Instead, as outlined elsewhere, I'd look into changing the license finding curation matcher logic to accept a In a way, what you want is the opposite of "deleting" a finding by setting the Somehow these invented findings then need to make it into the license info resolver. I have not thought about how to do that yet. |
I good way would be to join our weekly community meeting to discuss such ideas. |
Oh, that would great! I would like to see what the community is doing, the roadmap, and learn. I will drop by for sure next Thursday! |
248cfa0
to
fad230f
Compare
As discussed in the weekly meeting last week: Contribution
Summary Missing
|
58b863b
to
3d012e2
Compare
3d012e2
to
29f56ba
Compare
29f56ba
to
a9fd947
Compare
I have addressed detected problems: fixed the implicits and the commit lint :) |
It was a real large effort to introduce file lists (complete listing of all the files) in a non-redundant way. |
@fviernau thanks for the feedback. Could you provide some guidance to improve this PR with a more appropriate implementation? |
3a0497b
to
bdb6401
Compare
Add flag `includeUnlicensed` to the scanner configuration. Its default is `false`. When set to `true`, the scanner add to a `ScanResult` files without license as LicenseFindings with license set to `NONE`. This contribution makes possible to the scanner to display all files as license findings. The ultimate goal is that any file without license is catched by the scanner, so that curation mechanism can override files without licenses in cases where a license applies to a whole folder. Signed-off-by: Kiko Fernandez-Reyes <kiko@erlang.org>
bdb6401
to
3fc2797
Compare
/** | ||
* A flag to indicate whether the scanner should add files without license to the scanner results. | ||
*/ | ||
val includeUnlicensed: Boolean = false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my taste, this sounds too much like https://spdx.org/licenses/Unlicense.html. To avoid any confusion, I propose to rename this to includeFilesWithoutFindings
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, property-ordering-wise I don't like this to go between the two "skip..." properties. Between skipExcluded
and archive
makes more sense to me.
@@ -207,6 +207,40 @@ class Scanner( | |||
} | |||
} | |||
|
|||
val filteredScanResults = filterScanResultsByVcsPaths(controller.getAllScanResults(), vcsPathsForProvenances) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, a few random starting thoughts / comments:
-
This code gets complex and deserves a few more code comments that explain the algorithm / intention.
-
For the default case ideally no unnecessary copy should be done (like is done now in line 217). To archive that, I'd leave
val filteredScanResults
as-is in original line 195, and instead introduce a new variable likeval scanResults = if (scannerConfig.includeUnlicensed) { // Do the work. } else { filteredScanResults } // .... return ScannerRun.EMPTY.copy( config = scannerConfig, provenances = provenances, scanResults = scanResults, files = files, scanners = scanners )
// Adds files without license to the scanned results | ||
val scanSummary = | ||
controller.getAllFileLists()[scanResult.provenance]?.files | ||
.orEmpty().asSequence().mapNotNull { fileEntry -> | ||
if (fileEntry.path in licenseFiles) { | ||
null | ||
} else { | ||
fileEntry.path | ||
} | ||
}.toSet().let { fileEntryLicenses -> | ||
(fileEntryLicenses subtract licenseFiles).mapTo(mutableSetOf()) { newFinding -> | ||
LicenseFinding(license = "NONE", location = TextLocation(newFinding, 1)) | ||
}.let { | ||
val allFindings = scanResult.summary.licenseFindings union it | ||
scanResult.summary.copy(licenseFindings = allFindings) | ||
} | ||
} | ||
|
||
scanResult.copy( | ||
provenance = scanResult.provenance.alignRevisions(), | ||
summary = scanSummary | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reduce nesting and make the code more readable, I propose something like this (completely untested; this goes to // Do the work.
from my comment above):
filteredScanResults.mapTo(mutableSetOf()) { scanResult ->
val allPaths = controller.getAllFileLists()[scanResult.provenance]?.files?.map { it.path }.orEmpty()
val pathsWithFindings = scanResult.summary.licenseFindings.map { it.location.path }
val pathsWithoutFindings = allPaths - pathsWithFindings
val findingsThatAreNone = pathsWithoutFindings.map {
LicenseFinding(SpdxConstants.NONE, TextLocation(it, UNKNOWN_LINE))
}
scanResult.copy(
summary = scanResult.summary.copy(
licenseFindings = scanResult.summary.licenseFindings + findingsThatAreNone
)
)
}
Due to the introduced variables, this maybe is even clear enough to not require any code comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I'm using the SpdxConstants.NONE
constant instead of "NONE"
here, and deliberately an unknown / invalid line range. I have not checked whether this might cause any problems with some later sanity checks or so.
Based on #9435, it was stated that we could simply apply curations to files not listed in the license findings. The current PR is one such design but breaks backwards compatibility. I would like some feedback on how it would be preferred to deal with this.
Below I state some of the approaches I have thought of. Any feedback is welcomed.
PR Approach
NONE
when created as part of aLicenseFinding
.Design Option
Approach
I have tried to look for ways to changes to the
FindingCurationMatcher.kt
(recommended by @sschuberth ). There are a bunch of methods in this file, but none of them have available the files needed fromScanResult
s. I do not mind to "patch" all call sites that passfindings
:FindingCurationMatcher().matches(finding, curation)
FindingCurationMatcher().apply(finding, curation)
FindingCurationMatcher().applyAll(finding, curation)
and add to the
findings
all files without licenses asLicenseFinding(license="NONE"), location=...)
. In this way, a curation matcher with a glob pattern on a folder anddetected_license: NONE
can apply the curation.Questions/Comments
We apply curations to files not listed in the scanner results.
To me, this may seem counter-intuitive, since I do not think ORT has ever dealt with files not listed in the scanner. The idea is to apply curations to unlicensed files, but I think it is more uniform if unlicensed files are listed in the scanner. Thoughts?
Design Option 2
Add to the
.ort.yml
an option (or to the cli) to state that unlicensed files are part of the scanner and should be shown there.Use case
evaluator
.From my point of view, we need to analyse Erlang/OTP and would like warnings/errors for any files without license. Design option 2 makes explicit this option, and the scanner shows the truth result of scanned files.