Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WX-927 GCP Batch: LogsPolicy.PATH now streams the logs to GCS #7529

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

AlexITC
Copy link
Collaborator

@AlexITC AlexITC commented Sep 5, 2024

Description

Instead of pushing the logs file after the job completes, the logs are now streamed to GCS.

This is how it works:

  • Mount the main GCS bucket as a disk in the VM filesystem.
  • Configure Batch to store the logs in the mounted disk.
  • The log file belongs to the same path used by the task files.

Notes:

  • I haven't found a way to tail the GCS file but running cat continuously display the new logs.
  • I'm not sure whether the logs are streamed live or when the runnable completes, if needed, I can evaluate this.
  • There are some tricks I used to get this done, I'm open to suggestions for improving the approach.
  • Follows up from WX-927 GCP Batch: LogsPolicy is now configurable #7491

Release Notes Confirmation

CHANGELOG.md

  • I updated CHANGELOG.md in this PR
  • I assert that this change shouldn't be included in CHANGELOG.md because it doesn't impact community users

Terra Release Notes

  • I added a suggested release notes entry in this Jira ticket
  • I assert that this change doesn't need Jira release notes because it doesn't impact Terra users

Instead of pushing the logs file after the job completes, the logs are now streamed to GCS.
@AlexITC AlexITC requested a review from a team as a code owner September 5, 2024 18:18
@aednichols aednichols changed the title GCP Batch: LogsPolicy.PATH now streams the logs to GCS WX-927 GCP Batch: LogsPolicy.PATH now streams the logs to GCS Sep 5, 2024
@mcovarr
Copy link
Contributor

mcovarr commented Sep 9, 2024

Hi Alex, in our sync meetings we discussed an aysnc "sidecar" type solution for log file syncing, much like PAPI v2 has now. We were concerned that mounting a GCS filesystem that includes potentially large task files might consume a lot of bandwidth as these files are written. Streaming of files other than task logs are not necessary for our use cases.

@dspeck1
Copy link
Collaborator

dspeck1 commented Sep 9, 2024

The mounted GCS bucket is only for logs. The outputs still go through delocalization. There is not an option to do a sidecar within GCP Batch. Google would have to enable it within the product. We have asked and the response was this to do it this way.

@mcovarr
Copy link
Contributor

mcovarr commented Sep 9, 2024

OK thank you for the clarification, we'll try it this way then. 😄

@mcovarr
Copy link
Contributor

mcovarr commented Sep 12, 2024

Unfortunately this approach seems to have some issues:

  • The task log is not actually streamed as the WDL task runs. The "streaming" appears to happen at the level of GCP Batch runnables. While there is a lot of output in the task log from the runnables preceding the WDL task, there is no output uploaded from the WDL task itself until the task completes.
  • The contents of the task log contain log formatting that is not part of the "raw" stdout or stderr.
  • The contents of the task log also appear to contain GCP Batch agent log output intermingled with the task log output.
  • There are extra files being uploaded with names like stdout-job-ab8e9cd9-2dc3-6e005a9d-a38f-441d00-group0-0.log and stderr-job-ab8e9cd9-2dc3-6e005a9d-a38f-441d00-group0-0.log that appear to be Batch agent logs.

In the course of evaluating this I discovered that the stdout and stderr files actually are being streamed correctly, and furthermore that this is implemented via "sidecar" runnables created from Cromwell. Since the task log should conceptually be the interleaved stdout and stderr, it seems this approach should also work for the task logs.

@mcovarr
Copy link
Contributor

mcovarr commented Sep 13, 2024

Sketch of stdout/stderr-like sidecar-ish "streaming" solution here, definitely not fully baked yet.

@AlexITC
Copy link
Collaborator Author

AlexITC commented Sep 16, 2024

Thanks for sharing, do let me know if you need anything from me.

@AlexITC AlexITC closed this Sep 16, 2024
@mcovarr
Copy link
Contributor

mcovarr commented Sep 16, 2024

Thanks Alex. I just realized today that the approach in my PR does not capture output from any of the setup / localization / delocalization runnables. I'm not sure yet how much this matters, but it may turn out that the logging story isn't completely resolved yet. 🙂

@dspeck1 dspeck1 reopened this Nov 25, 2024
@dspeck1 dspeck1 requested a review from a team as a code owner November 25, 2024 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants