You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Then the "live logs" do not really show any progress with the Dask jobs, and the user sees only:
$ rc logs -w dask --follow -i 102024-11-13 09:26:46,169 | root | MainThread | INFO | Publishing step:0, cmd: python analysis.py, total steps 1 to MQ
In other words, we are not capturing logs from the running Dask scheduler and the worker pods, such as:
$ k logs reana-run-dask-6f93ffcf-7e23-42de-a269-e2ce629f80f7-schedud7wfs/usr/local/lib/python3.10/site-packages/distributed/cli/dask_scheduler.py:140: FutureWarning: dask-scheduler is deprecated and will be removed in a future release; use `dask scheduler` instead warnings.warn(2024-11-13 09:26:36,664 - distributed.scheduler - INFO - -----------------------------------------------2024-11-13 09:26:37,462 - distributed.scheduler - INFO - State start2024-11-13 09:26:37,464 - distributed.scheduler - INFO - -----------------------------------------------2024-11-13 09:26:37,465 - distributed.scheduler - INFO - Scheduler at: tcp://10.244.0.85:87862024-11-13 09:26:37,465 - distributed.scheduler - INFO - dashboard at: :87872024-11-13 09:26:48,243 - distributed.scheduler - INFO - Receive client connection: Client-6839cd01-a1a1-11ef-800d-6ad72d89ef662024-11-13 09:26:48,244 - distributed.core - INFO - Starting established connection to tcp://10.244.0.86:393302024-11-13 09:26:52,455 - distributed.scheduler - INFO - Register worker <WorkerState 'tcp://10.244.0.87:46297', name: reana-run-dask-6f93ffcf-7e23-42de-a269-e2ce629f80f7-default-worker-239985c006, status: init, memory: 0, processing: 0>2024-11-13 09:26:52,455 - distributed.scheduler - INFO - Starting worker compute stream, tcp://10.244.0.87:462972024-11-13 09:26:52,455 - distributed.core - INFO - Starting established connection to tcp://10.244.0.87:372462024-11-13 09:27:07,521 - distributed.scheduler - INFO - Register worker <WorkerState 'tcp://10.244.0.88:39151', name: reana-run-dask-6f93ffcf-7e23-42de-a269-e2ce629f80f7-default-worker-4e117f4fec, status: init, memory: 0, processing: 0>2024-11-13 09:27:07,522 - distributed.scheduler - INFO - Starting worker compute stream, tcp://10.244.0.88:391512024-11-13 09:27:07,522 - distributed.core - INFO - Starting established connection to tcp://10.244.0.88:37438
$ k logs reana-run-dask-6f93ffcf-7e23-42de-a269-e2ce629f80f7-defauljdnbt/usr/local/lib/python3.10/site-packages/coffea/nanoevents/schemas/nanoaod.py:201: RuntimeWarning: Missing cross-reference index for Jet_muonIdx2 => Muon warnings.warn(/usr/local/lib/python3.10/site-packages/coffea/nanoevents/schemas/nanoaod.py:201: RuntimeWarning: Missing cross-reference index for Muon_fsrPhotonIdx => FsrPhoton warnings.warn(/usr/local/lib/python3.10/site-packages/coffea/nanoevents/schemas/nanoaod.py:201: RuntimeWarning: Missing cross-reference index for Photon_electronIdx => Electron warnings.warn(2024-11-13 09:27:04,324 - distributed.utils_perf - INFO - full garbage collection released 39.17 MiB from 104785 reference cycles (threshold: 9.54 MiB)2024-11-13 09:27:14,928 - distributed.utils_perf - INFO - full garbage collection released 39.54 MiB from 104785 reference cycles (threshold: 9.54 MiB)2024-11-13 09:27:27,675 - distributed.utils_perf - INFO - full garbage collection released 32.21 MiB from 86289 reference cycles (threshold: 9.54 MiB)2024-11-13 09:27:40,523 - distributed.utils_perf - INFO - full garbage collection released 25.70 MiB from 67810 reference cycles (threshold: 9.54 MiB)2024-11-13 09:27:55,872 - distributed.utils_perf - INFO - full garbage collection released 41.95 MiB from 110946 reference cycles (threshold: 9.54 MiB)
Expected behaviour
The user would expect to see the live logs from the running Dask scheduler and workers when the user follows live workflow execution.
Note
Note that when the workflow successfully finishes, this is what we capture in the database:
$ rc logs -w dask==> Workflow engine logs2024-11-13 09:26:46,169 | root | MainThread | INFO | Publishing step:0, cmd: python analysis.py, total steps 1 to MQ2024-11-13 09:31:43,425 | root | MainThread | INFO | Workflow 6f93ffcf-7e23-42de-a269-e2ce629f80f7 finished. Files available at /var/reana/users/00000000-0000-0000-0000-000000000000/workflows/6f93ffcf-7e23-42de-a269-e2ce629f80f7.==> Job logs==> Step: process==> Workflow ID: 6f93ffcf-7e23-42de-a269-e2ce629f80f7==> Compute backend: Kubernetes==> Job ID: reana-run-job-e476c924-95ee-4c11-9207-084afa3c7843==> Docker image: docker.io/coffeateam/coffea-dask-cc7:0.7.22-py3.10-g7f049==> Command: python analysis.py==> Status: finished==> Started: 2024-11-13T09:26:46==> Finished: 2024-11-13T09:31:39==> Logs:Matplotlib created a temporary cache directory at /tmp/matplotlib-fkzl98_7 because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.[## ] | 5% Completed | 19.4s[##### ] | 13% Completed | 44.3s[######## ] | 21% Completed | 1min 8.3s[############ ] | 30% Completed | 1min 30.9s[############### ] | 38% Completed | 1min 53.6s[################## ] | 45% Completed | 2min 16.2s[##################### ] | 54% Completed | 2min 38.9s[######################## ] | 62% Completed | 3min 1.5s[########################### ] | 69% Completed | 3min 24.1s[############################### ] | 77% Completed | 3min 46.9s[################################## ] | 85% Completed | 4min 9.5s[##################################### ] | 93% Completed | 4min 32.2sall events 53446198number of chunks 534
And this is what we capture in OpenSearch "live logs":
$ rc logs -w dask --follow2024-11-13 09:26:46,169 | root | MainThread | INFO | Publishing step:0, cmd: python analysis.py, total steps 1 to MQ2024-11-13 09:31:43,425 | root | MainThread | INFO | Workflow 6f93ffcf-7e23-42de-a269-e2ce629f80f7 finished. Files available at /var/reana/users/00000000-0000-0000-0000-000000000000/workflows/6f93ffcf-7e23-42de-a269-e2ce629f80f7.==> Workflow has completed, you might want to rerun the command without the --follow flag.
The text was updated successfully, but these errors were encountered:
tiborsimko
changed the title
dask: capure live logs from running Dask scheduler and worker pods
dask: capture live logs from running Dask scheduler and worker pods
Nov 13, 2024
Current behaviour
When the administrator enables "live logs" feature on the REANA deployment:
And when a user runs a Dask workflow:
Then the "live logs" do not really show any progress with the Dask jobs, and the user sees only:
In other words, we are not capturing logs from the running Dask scheduler and the worker pods, such as:
Expected behaviour
The user would expect to see the live logs from the running Dask scheduler and workers when the user follows live workflow execution.
Note
Note that when the workflow successfully finishes, this is what we capture in the database:
And this is what we capture in OpenSearch "live logs":
The text was updated successfully, but these errors were encountered: