Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to match Copyright garbage by Regex patterns #6592

Merged
merged 7 commits into from
Mar 8, 2023

Conversation

sschuberth
Copy link
Member

@sschuberth sschuberth commented Mar 1, 2023

Please have a look at the individual commit messages for the details.

@sschuberth sschuberth changed the title Cr garbage prefix Allow Copyright garbage to be just prefixes Mar 1, 2023
@sschuberth sschuberth changed the title Allow Copyright garbage to be just prefixes Treat all Copyright garbage as prefixes Mar 1, 2023
@sschuberth sschuberth changed the title Treat all Copyright garbage as prefixes Allow to match Copyright garbage by Regex patterns Mar 7, 2023
@sschuberth sschuberth requested a review from fviernau March 7, 2023 16:01
@sschuberth sschuberth marked this pull request as ready for review March 7, 2023 16:02
@sschuberth sschuberth requested review from a team as code owners March 7, 2023 16:02
@sschuberth sschuberth force-pushed the cr-garbage-prefix branch 2 times, most recently from 8d8a324 to 8d75b58 Compare March 7, 2023 17:19
model/src/main/kotlin/config/CopyrightGarbage.kt Outdated Show resolved Hide resolved
model/src/main/kotlin/config/CopyrightGarbage.kt Outdated Show resolved Hide resolved
model/src/main/kotlin/config/CopyrightGarbage.kt Outdated Show resolved Hide resolved
See #6235 for context. Here, the use of `StringSortedSetConverter` is
not required, as `ImportCopyrightGarbageCommand` is the only place that
serializes the class, and that uses its own `Collator` for custom
sorting.

Signed-off-by: Sebastian Schuberth <[email protected]>
Adjust the docs accordingly and reword them a bit along the way.

Resolves #6591.

Signed-off-by: Sebastian Schuberth <[email protected]>
@sschuberth sschuberth merged commit 4984b10 into main Mar 8, 2023
@sschuberth sschuberth deleted the cr-garbage-prefix branch March 8, 2023 07:41
@sschuberth sschuberth added the release notes Changes that should be mentioned in release notes label Mar 9, 2023
@pombredanne
Copy link
Contributor

It would super useful and likely more efficient if these garbage copyright could be reported upstream as bugs to ScanCode Toolkit where these will be fixed forever and everyone ;)

@sschuberth
Copy link
Member Author

It would super useful and likely more efficient if these garbage copyright could be reported upstream as bugs to ScanCode Toolkit where these will be fixed forever and everyone ;)

We had this discussion internally already @pombredanne, and I actually believe the copyright garbage feature in ORT would not really be required all that much anymore with recent version of ScanCode. If we'd see the need to update / create a garbage file, we'd report that findings against ScanCode, rest assured 😉

@fviernau
Copy link
Member

fviernau commented Mar 17, 2023

We had this discussion internally already @pombredanne, and I actually believe the copyright garbage feature in ORT would not really be required all that much anymore with recent version of ScanCode. If we'd see the need to update / create a garbage file, we'd report that findings against ScanCode, rest assured wink

I agree it is a good thing to fix things upstream. But not on the critical path of any compliance scan. This implies to me that ORT must provide means to make short term fixes on that critical path, so that issues can be fixed timely. While contributing back can be done when time permits. So, I disagree that this feature can be dropped as it provide means to zap out copyright statements, which is needed even though it's use becomes less and less frequent.

Regarding contributing back, @bennati would you consider contributing your ORG's garbage entries?

@bennati
Copy link
Contributor

bennati commented Mar 17, 2023

I agree, we cannot wait for scancode to fix the garbage issues, unless scancode fetches the latest rules from the online repo at each scan @pombredanne ?

Our garbage file was filled with findings by older scancode versions, would these help when used with newer scancodes?

@sschuberth
Copy link
Member Author

would these help when used with newer scancodes?

Very likely not. Only those garbage findings that are still valid for the current ScanCode version would be helpful. Actually, to remove unneeded garbage entries, I personally think it makes sense to start from scratch with an empty file and add entries anew as needed.

@fviernau
Copy link
Member

fviernau commented Mar 17, 2023

Our garbage file was filled with findings by older scancode versions, would these help when used with newer scancodes?

@pombredanne I believe this is for you. I presumed you could just put all garbage entries into a plain text file, run latest ScanCode against it and see which are still considered as copyrights statements. Which would be your list of bugs then. Does this make sense?

@fviernau
Copy link
Member

Very likely not. Only those garbage findings that are still valid for the current ScanCode version would be helpful. Actually, to remove unneeded garbage entries, I personally think it makes sense to start from scratch with an empty file and add entries anew as needed.

Maybe we shouldn't assume that all users are on latest ScanCode version. As a compromise how about limiting it to
a min ScanCode version. E.g. add all garbage entries for ScanCode 3.x + but not for 2.x ?

@sschuberth
Copy link
Member Author

Maybe we shouldn't assume that all users are on latest ScanCode version.

That's not what I'm assuming, but I am assuming that @pombredanne is only interested in issue reports against the latest ScanCode version; otherwise he'd first need to verify for each report if there's still something to fix with the latest ScanCode version.

@fviernau
Copy link
Member

fviernau commented Mar 17, 2023

That's not what I'm assuming, but I am assuming that @pombredanne is only interested in issue reports against the latest ScanCode version;

I was proposing to contribute the copyright garbage to ort-config so these can be used to filter out copyright garbage by any ORT user. Apart from that, as I outlined here above [1] it may be possible to filter the entries for the latest scancode version, so that the whole list can be indeed useful for @pombredanne as input if the approach works out.

[1] #6592 (comment)

@sschuberth
Copy link
Member Author

Ok guys, this discussion is getting too off-topic now from the topic of the original PR. Please consider continuing in our discussions forum.

@oss-review-toolkit oss-review-toolkit locked as off-topic and limited conversation to collaborators Mar 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
release notes Changes that should be mentioned in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants