Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[testing PR] test why e2e tests are failing #13002

Closed
wants to merge 1 commit into from

Conversation

isubasinghe
Copy link
Member

Fixes #TODO

Motivation

Modifications

Verification

@agilgur5
Copy link
Member

agilgur5 commented May 4, 2024

Rebased the branch to an earlier version of main before #12736 was merged to see if that's the cause.
See also Slack thread

@agilgur5
Copy link
Member

agilgur5 commented May 4, 2024

CI is still failing with the same errors, so my next guess would be #12741 , although that one's pretty small & targeted

@agilgur5
Copy link
Member

agilgur5 commented May 4, 2024

And it's still failing with the same error on a version of main prior to #12741 being merged. I'm even more confused now....

@isubasinghe
Copy link
Member Author

Hmm I thought it was #12736 as well due to a git blame I came upon.

I suppose the quickest way to find out what the first offending commit is to git bisect.

@agilgur5
Copy link
Member

agilgur5 commented May 5, 2024

That uh assumes knowing a "good" commit, and I'm not sure which one is good 😬

Alan and I were thinking on Slack that it could potentially be an unpinned dep somewhere, though I couldn't pinpoint one that added up in the manifests. I was looking at the debug logs and wondering if the tests might be pulling argocli:latest from the web instead of the locally built one

@agilgur5
Copy link
Member

agilgur5 commented May 5, 2024

The last rebase I tried was on top of / after #12972, and it's still failing with the same errors

@agilgur5
Copy link
Member

agilgur5 commented May 5, 2024

Ok rebased once more to after #12917 annnd still same error. I chose that one as it was what #12926 was based on, which just merged (i.e. passed E2E on its own branch).

So pretty sure it's not due to a commit at this point

@agilgur5
Copy link
Member

agilgur5 commented May 6, 2024

Noting here that the failing API test does result in a nil pointer dereference error in the Server logs:

server: time="2024-05-05T02:31:39.703Z" level=info msg="Get artifact file" artifactName=local-artifact namespace=argo nodeId=wf-stopped-8j2gw-1075147760 uid=3b15ead1-612c-4820-8d62-e72018069605
server: 2024/05/05 02:31:39 http: panic serving [::1]:54866: runtime error: invalid memory address or nil pointer dereference
server: goroutine 295 [running]:
server: net/http.(*conn).serve.func1()
server: 	/opt/hostedtoolcache/go/1.21.9/x64/src/net/http/server.go:1868 +0xb9
server: panic({0x2994660?, 0x4b2cd40?})
server: 	/opt/hostedtoolcache/go/1.21.9/x64/src/runtime/panic.go:920 +0x270
server: github.com/argoproj/argo-workflows/v3/server/artifacts.(*ArtifactServer).GetArtifactFile(0xc000716960, {0x3303b00, 0xc0002d80e0}, 0xc000c3c000)
server: 	/home/runner/work/argo-workflows/argo-workflows/server/artifacts/artifact_server.go:141 +0x8b8

@Joibel
Copy link
Member

Joibel commented May 7, 2024

I've managed to produce ErrImageNeverPull: Container image "argoproj/argocli:latest" is not present with pull policy of Never locally by adding a Never image pull policy for artgc-dag-workflow-stopper which is used in artgc-dag-wf-self-delete, one of the failing tests.

I can't see where the argocli image gets built and pushed during e2e, so I'll see if I can find a satisfactory fix for this.

@Joibel
Copy link
Member

Joibel commented May 8, 2024

This is fixed in #13018

@Joibel Joibel closed this May 8, 2024
@agilgur5
Copy link
Member

agilgur5 commented May 8, 2024

Alan and I were thinking on Slack that it could potentially be an unpinned dep somewhere, though I couldn't pinpoint one that added up in the manifests. I was looking at the debug logs and wondering if the tests might be pulling argocli:latest from the web instead of the locally built one

I've managed to produce ErrImageNeverPull: Container image "argoproj/argocli:latest" is not present with pull policy of Never locally by adding a Never image pull policy for artgc-dag-workflow-stopper which is used in artgc-dag-wf-self-delete, one of the failing tests.

This is fixed in #13018

Thanks Alan for confirming my hypothesis and fixing the imagePullPolicy so that we don't run into that issue again!

We should still pin all the other images in the test manifests too, to prevent a similar type of bug (or supply chain attack via tests)

@agilgur5 agilgur5 added the area/build Build or GithubAction/CI issues label May 8, 2024
@agilgur5 agilgur5 changed the title fix: test if e2e tests are failing due to flakiness [testing PR] test why e2e tests are failing May 8, 2024
@agilgur5 agilgur5 added the solution/superseded This PR or issue has been superseded by another one (slightly different from a duplicate) label May 8, 2024
@argoproj argoproj locked as resolved and limited conversation to collaborators May 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/build Build or GithubAction/CI issues solution/superseded This PR or issue has been superseded by another one (slightly different from a duplicate)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants