Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH actions long delay between finishing build job and starting success job #1376

Open
1 of 2 tasks
n2ygk opened this issue Dec 19, 2023 · 8 comments
Open
1 of 2 tasks
Labels

Comments

@n2ygk
Copy link
Member

n2ygk commented Dec 19, 2023

Describe the bug

In watching multiple PRs after I've approved them, it appears to take a long time for the success job to start after the last step of the build job has finished. See #1219 where the separate success job was added to make it easier to update the matrix and only ever depend on build to finish for tests to succeed.

To Reproduce

Cause a PR to run tests.

Expected behavior

I didn't expect anything but was hoping that the wait for the success step wouldn't happen.

Version

current master branch

  • I have tested with the latest published release and it's still a problem.
  • I have tested with the master branch and it's still a problem.

Additional context

@dopry I'm guessing that GH is allocating a runner(s) for each job, so after the build job finishes, we wait for another runner to become available for the success job. This takes a while. See below with timestamps selected. So I am guessing that running a second job that depends on the first has to wait for a new runner to become available. Sometimes correlation is indicative of causation.

Mon, 18 Dec 2023 17:59:18 GMT last matrix step of build job finished
Mon, 18 Dec 2023 18:31:45 GMT success job starts

While watching the PR, the success job status is waiting on a runner. Here's some raw log showing the 30 minute wait for a runner:

2023-12-18T17:59:50.5029369Z Requested labels: ubuntu-latest
2023-12-18T17:59:50.5029714Z Job defined at: jazzband/django-oauth-toolkit/.github/workflows/test.yml@refs/heads/pre-commit-ci-update-config
2023-12-18T17:59:50.5029846Z Waiting for a runner to pick up this job...
2023-12-18T18:31:40.3399714Z Job is waiting for a hosted runner to come online.
2023-12-18T18:31:42.6767234Z Job is about to start running on the hosted runner: GitHub Actions 7 (hosted)
...
@dopry
Copy link
Contributor

dopry commented Dec 19, 2023

You are correct in how you describe the behavior. We are probably also throttled a bit since we have such an intense job run. A runner is allocated for every build in the matrix. Maybe explicitly selecting a different runner class for the success job would get it allocated more quickly.

@n2ygk
Copy link
Member Author

n2ygk commented Dec 19, 2023

Yeah presumably these runners are all counted against the Jazzband org. Can we try this without having to bug @jezdez?

@dopry
Copy link
Contributor

dopry commented Dec 21, 2023

The reason we added the success job to the build process so we wouldn't need @jezdez to intercede to change the success criteria of our builds since we don't have settings access. We should be able to select the machine class by changing runs-on for the success job. Maybe we can get away without specifying it? I'm not sure what the default is...

@dopry
Copy link
Contributor

dopry commented Dec 21, 2023

I think this is something we could maybe open with Github support?

@dopry
Copy link
Contributor

dopry commented Dec 21, 2023

I assume we're waiting on the backlog of jazzband jobs and it's being slowed down by the concurrent job limit, https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration

@dopry
Copy link
Contributor

dopry commented Dec 21, 2023

Another option may be to go ahead and reduce our matrix dropping django 4.0 and django 4.1 since they're no longer supported upstream. That should reduce our matrix by 10 jobs. Success still won't be enqueued until they're complete...

@dopry
Copy link
Contributor

dopry commented Dec 21, 2023

alternatively if @jezdez would give you, @n2ygk, or someone else on team settings access to this repo, then we could manage the branch protections ourselves and wouldn't need the success job since we could update the required checks when needed.

@dopry
Copy link
Contributor

dopry commented Dec 21, 2023

@jezdez @n2ygk I fired off a request to GH support to increase the concurrent build limit for the jazzband organization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants