Issue with completed tasks hanging and showing as in-progress #147007

ricardoamador · 2024-04-18T20:42:01Z

Type of Request

bug

Infrastructure Environment

Cocoon Scheduler and Cocoon Dashboard

What is happening?

In the packages repository a particular task has run multiple times even though it had previously passed. The task Mac_arm64 ios_platform_tests_shard_5 stable (https://ci.chromium.org/ui/p/flutter/builders/luci.flutter.prod/Mac_arm64%20ios_platform_tests_shard_5%20stable) for commit 0e3809d995b66af0b54b91d7e2412cf413b8717b is shown to have run many times with passing runs but the task kept getting rerun. See below:

Something similar happened a couple of days ago where a task ran 46 times well beyond what the limited number of retries is.
To follow up with this I noticed while recovering tasks that this commit d39830e40c07f71c7086128b45cffeb66be19488 had this test, Mac_arm64 ios_platform_tests_shard_2 master, run 43 times against the commit. I thought we only limited reruns to 3 times?

Thread link here: https://chat.google.com/room/AAAAaqs_Mg0/HZmppnXFtP8/HZmppnXFtP8?cls=10

Steps to reproduce

Step 1:
Step 2:
..
Step n:

Expected results

I expect to see X when Y is finished.

ricardoamador · 2024-04-18T20:51:18Z

A bit of more information while I was looking to recover the task, Firestore only has the task for that commit with attempts 3 appended to the task entry in the datastore:

Nothing appended beyond that:

And in datastore it tracked the last run as the third failed attempt but was left as "In progress" (The screenshot shows succeeded as I had recovered it based on the newest run):

But you can see in the datastore that it is tracking the current number of attempts which shows it at 15.

keyonghan · 2024-04-19T00:14:14Z

Seems this is due to missing logic to handle tag current_attempt when rerunning via checkrun Re-run from GitHub UI for post-submit checkruns. https://github.com/flutter/cocoon/blob/main/app_dart/lib/src/service/luci_build_service.dart#L413

When rerunning from GitHub UI, it reset the current_attempt to 1: https://github.com/flutter/cocoon/blob/main/app_dart/lib/src/service/luci_build_service.dart#L616, which causes confusion on Firestore side.

ricardoamador · 2024-04-19T02:30:08Z

This happened again here:

stuartmorgan · 2024-04-19T10:51:55Z

FWIW, the GitHub UI was the only UI to manage tasks in flutter/packages for many years, so trying to retrain everyone to never use that UI—especially when we still have the release task that can only be run/re-run from there—is going to be non-trivial. If we could make it work instead, that would be helpful.

ricardoamador · 2024-04-19T17:02:53Z

@stuartmorgan nah I don't think this is a matter of retraining but just a bug on an edge case caused by a migration to a new datastore.

especially when we still have the release task that can only be run/re-run from there

This is peculiar, are you saying you are running release tasks from the github UI during presubmit? Can you add more context here?

stuartmorgan · 2024-04-22T12:03:27Z

I don't think this is a matter of retraining

I was referring to this comment in an issue dup'd to this one.

especially when we still have the release task that can only be run/re-run from there

This is peculiar, are you saying you are running release tasks from the github UI during presubmit? Can you add more context here?

No, I'm saying that the post-submit GitHub Actions task called release, which is responsible for actually publishing all of the packages in flutter/packages and is thus a critical part of our CI and gardening responsibilities, is only visible—and thus re-runnable—in the GitHub UI, not in the Flutter dashboard.

ricardoamador · 2024-04-22T14:13:54Z

@stuartmorgan okay, thanks for clarifying.

keyonghan · 2024-04-22T15:44:12Z

No, I'm saying that the post-submit GitHub Actions task called release, which is responsible for actually publishing all of the packages in flutter/packages and is thus a critical part of our CI and gardening responsibilities, is only visible—and thus re-runnable—in the GitHub UI, not in the Flutter dashboard.

The rerun I referred to in #147033 (comment) was for LUCI (postsubmit) check run only. The GitHub action will not be affected and can be rerun as usual.

To be clear, the the LUCI check run (rerun) connects to cocoon backend to reschedule/update new builds, and is experiencing some issues. These issues can be workaround by calling the cocoon reset-prod-task API directly.

Anyway, I will give flutter/cocoon#3675 a high priority this week for a fix.

stuartmorgan · 2024-04-22T15:56:20Z

The rerun I referred to in #147033 (comment) was for LUCI (postsubmit) check run only. The GitHub action will not be affected and can be rerun as usual.

I understand that, but what I was saying is that everyone on the ecosystem gardener rotation:

has muscle memory to retry failing post-submits from the GitHub UI, because it's how everything worked in that tree for many years, and
still has to interact with that UI when there are failures (because failures in any post-submit test will cause release to fail, by design, and so release has to be re-run with other tests).

The combination of those two things makes it harder in practice to stop using it for LUCI tests than it is in theory.

ricardoamador added team-infra Owned by Infrastructure team P1 High-priority issues at the top of the work list labels Apr 18, 2024

yusuf-goog assigned ricardoamador and keyonghan Apr 18, 2024

yusuf-goog added the triaged-infra Triaged by Infrastructure team label Apr 18, 2024

This was referenced Apr 19, 2024

Fix postsubmit rerun based on checkrun flutter/cocoon#3675

Open

Dashboard shows test as in process when it has already passed #147033

Closed

stuartmorgan assigned stuartmorgan and unassigned ricardoamador and keyonghan Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with completed tasks hanging and showing as in-progress #147007

Issue with completed tasks hanging and showing as in-progress #147007

ricardoamador commented Apr 18, 2024

ricardoamador commented Apr 18, 2024

keyonghan commented Apr 19, 2024

ricardoamador commented Apr 19, 2024

stuartmorgan commented Apr 19, 2024

ricardoamador commented Apr 19, 2024 •

edited

stuartmorgan commented Apr 22, 2024

ricardoamador commented Apr 22, 2024

keyonghan commented Apr 22, 2024

stuartmorgan commented Apr 22, 2024

Issue with completed tasks hanging and showing as in-progress #147007

Issue with completed tasks hanging and showing as in-progress #147007

Comments

ricardoamador commented Apr 18, 2024

Type of Request

Infrastructure Environment

What is happening?

Steps to reproduce

Expected results

ricardoamador commented Apr 18, 2024

keyonghan commented Apr 19, 2024

ricardoamador commented Apr 19, 2024

stuartmorgan commented Apr 19, 2024

ricardoamador commented Apr 19, 2024 • edited

stuartmorgan commented Apr 22, 2024

ricardoamador commented Apr 22, 2024

keyonghan commented Apr 22, 2024

stuartmorgan commented Apr 22, 2024

ricardoamador commented Apr 19, 2024 •

edited