Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gitlint is showing all files as changed in github CI #338

Open
jorisroovers opened this issue Sep 22, 2022 · 12 comments
Open

gitlint is showing all files as changed in github CI #338

jorisroovers opened this issue Sep 22, 2022 · 12 comments
Assignees
Labels
bug User-facing bugs

Comments

@jorisroovers
Copy link
Owner

Gitlint is incorrectly listing all files as changed in CI, not only those that have changed:
https://github.com/jorisroovers/gitlint/actions/runs/3014300199/jobs/4935936697#step:14:104

Is this perhaps due to a misuse of refspec? Should we be passing a -1 argument like we do with log?

@jorisroovers jorisroovers changed the title BUG: gitlint is showing all files as changed in github CI gitlint is showing all files as changed in github CI Sep 22, 2022
@jorisroovers jorisroovers added this to the 0.18.0 milestone Sep 22, 2022
@jorisroovers jorisroovers added the bug User-facing bugs label Sep 22, 2022
@jorisroovers
Copy link
Owner Author

Found the issue.

By default the checkout action only checks out the commit being modified.
This behavior can be modified using the fetch-depth option (setting it to 0 checks out the entire history).

In gitlint, we basically invoke git diff-tree <some flags> --root <sha>:

"diff-tree", "--no-commit-id", "--numstat", "-r", "--root", self.sha, _cwd=self.context.repository_path

It's the --root flag that is the culprit:

When --root is specified the initial commit will be shown as a big creation event. This is equivalent to a diff against the NULL tree.

We added the --root flag in 32d8cc2, in order to be able to detect the changed files of an initial commit on a repo. This works as intended when the entire git history is checked out, but when only checking out a single commit (as happens in CI), this will cause git to return all files as having changed.

While this is easily fixed in CI by setting fetch-depth:0 as part of the checkout action (example fix), I think gitlint's behavior is actually incorrect here. Gitlint should only show files changed in the commit itself, not including any files from the initial commit.

Need to brainstorm and research a bit whether that means we shouldn't show any changed files when linting the initial commit (i.e. remove --root from the diff-tree invocation), or whether we should treat the initial commit as a special case and only include --root then.

@jorisroovers
Copy link
Owner Author

So when a single commit is checked out from git (a shallow clone), its parent information is lost.

Typically, the way to determine whether a commit is one of the initial commits on a repo, is by checking whether the commit has parents using a variation of:

> git rev-list --max-parents=0 HEAD
765a4427f80ccdda61bc78e2680bda14787b2a4e # this is the sha of the initial commit

We could have used the output of this to figure out when to include the --root flag in the our diff-tree command: only include it when the current commit is an initial commit, i.e. it has no parents. But so it turns out this won’t work because that parent information is also lost in a shallow commit.

I did find out that such shallow commits that are not initial commits have a so-call grafted annotation, to indicate they weren’t really an initial commit within their commit history.

While technically it might be possible to use this grafted annotation to figure out whether any given commit without parents is an initial commit or not, and depending on that use the --root flag in the git diff-tree command, I think I’m starting to go down a rabbit hole of complexity here for an edge case that might not even be used by anyone.

In addition, it’s worth calling out that gitlint actually doesn’t adhere to the same semantics as git does by default, i.e. git by default won’t show you files that were part of the initial commit as being changed files.

As such, I’m inclined to revert gitlint behavior on this to align with default git semantics. In other words, gitlint won’t list files that were added in the initial commit as part of its changed_files list. Ideally, we might want add a flag to toggle this behavior, but I’m not sure whether that’s a level of implementation and maintenance complexity I want to add right now. Will let this rest for a bit…

@jorisroovers jorisroovers self-assigned this Sep 28, 2022
@jorisroovers jorisroovers removed this from the 0.18.0 milestone Oct 24, 2022
@chbndrhnns
Copy link

The same behaviour applies to Gitlab CI.

@chbndrhnns
Copy link

As a side note, I am stuck on 0.15 now because of that issue and this means I cannot run a recent version of black which depends on click>=8 while gitlint 0.15 pins click to version 7.

@webknjaz
Copy link
Contributor

@jorisroovers FYI here's an example of using GitHub API to figure out how many commits a PR has (up to a certain limit, w/o pagination) cherrypy/cheroot#579. That PR is only implemented for pull_request event triggers, but maybe there's enough details in the context for retrieving this information for the push events too?

@chbndrhnns
Copy link

FYI here's an example of using GitHub API

I would very much want this to work agnostic of the Git provider as I am using GitLab in a corporate environment.

@webknjaz
Copy link
Contributor

@chbndrhnns this won't be possible for as long as you want to only lint commits from PRs/MRs. Each of the providers needs to be handled separately. Figuring out what to lint may be embedded into gitlint, technically, but this would only work with envs for which it'll be implemented, any new/exotic cases would need to be handled in the job definitions outside the call to gitlint.

@webknjaz
Copy link
Contributor

FWIW, I don't use GitLab actively these days, but I'm happy to consult on/review the GHA-related things @jorisroovers.

@chbndrhnns
Copy link

this won't be possible for as long as you want to only lint commits from PRs/MRs

I understand it used to work up until gitlint 0.15 so "won't be possible" seems a bit too strong.

Also, one comment above outlines a solution that sounds compatible to me:

In other words, gitlint won’t list files that were added in the initial commit as part of its changed_files list. Ideally, we might want add a flag to toggle this behavior, but I’m not sure whether that’s a level of implementation and maintenance complexity I want to add right now.

@webknjaz
Copy link
Contributor

webknjaz commented Nov 21, 2022

I guess, what works and what doesn't, depends on the PoV. Of course, you could track down the common ancestor for two branches, but usually CIs don't check out the whole Git tree. They get the history up to certain depths. For GitHub this defaults to 1, for Travis CI it used to be like 200 commits back IIRC. So there are some bits that won't work w/o certain pre-setup. I suppose that gitlint won't want to be intrusive, so automatically fetching more history is not an option in this case and would require some pre-setup from the users.

@chbndrhnns
Copy link

So there are some bits that won't work w/o certain pre-setup.

I get that now thanks to your explanation, so thanks!

@jorisroovers
Copy link
Owner Author

jorisroovers commented Nov 25, 2022

I realize I didn't mention earlier that there’s really 2 parts to this.

Initial Commits and the git diff-tree --root flag

An important detail worth repeating is that git itself doesn’t consider files of the initial commit as “changed files” when you do git diff-tree, i.e. for the initial commit, git won’t show you the added files unless you specifically tell it to with the --root flag.

I had added the --root flag to the git diff-tree command that gitlint calls to deal with this, but hadn’t consider the shallow clone scenario.

Since I’ve now discovered that it’s not possible for gitlint to (easily + accurately) determine whether a commit is root commit, I believe gitlint should align with git’s behavior and not show files for initial/root commits unless you explicitly tell it to, with a config option like:

[general]
# consider files of root/initial commits as changed files
# (option name subject to change) 
diff-tree-root=False # disabled by default

You don’t actually have to use a .gitlint file for this, rather you can use the -c commandline flag to specify the same: gitlint -c general.diff-tree-root=True. Gitlint will still read the .gitlint file, but apply this config on top.

By using this construct, you can have an if/else statement in CI that deals with the initial commit separately if you really want it to. I think most users won’t actually care for the initial commit. We could documenting and/or provide templates on how to do this in various CI systems.

Fetch depth

However, this diff-tree-root option still doesn’t correctly solve the fetch-depth=1 challenge in CI, it just provides and all-or-nothing alternative to it. Setting diff-tree-root=True will end up showing all files, setting diff-tree-root=False will show none if fetch-depth is only 1.

The only way to resolve this AFAICT, is to increase the fetch-depth, which is CI specific. I don’t want to include CI specific customizations in gitlint itself, but perhaps we can also make this better using documentation and/or templates.

To reproduce this locally:

# Fetch depth of 1
$ git clone --depth 1 https://github.com/jorisroovers/gitlint.git /tmp/gitlint
$ cd /tmp/gitlint
$ git diff-tree --no-commit-id --numstat -r 01f2fa0
# No output :(
$ git diff-tree --no-commit-id --numstat -r --root 01f2fa0
# All files are listed :(
$ rm -rf /tmp/gitlint

# Fetch depth to 2(or more): fixes this issue
# Whether you use --root or not, only matters for the initial commit (not shown here)
$ git clone --depth 2 https://github.com/jorisroovers/gitlint.git /tmp/gitlint
$ cd /tmp/gitlint
$ git diff-tree --no-commit-id --numstat -r 01f2fa0
1	1	gitlint-core/setup.py
$ git diff-tree --no-commit-id --numstat -r --root 01f2fa0
1	1	gitlint-core/setup.py
$ rm -rf /tmp/gitlint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug User-facing bugs
Projects
Status: No status
Development

No branches or pull requests

3 participants