Odd behavior with SimpleWorker #2037

micmizer · 2024-02-08T21:20:42Z

micmizer
Feb 8, 2024

Running the following in my environment:

Python 3.11.5
rq==1.15.1
redis==5.0.1

I need to be able to share database connections between my jobs for each worker. Initially i was using the default Worker but was seeing lots of unexplainable rq.timeouts.JobTimeoutException. I suspected this was due to all of my jobs using the same database engine/connections and Worker forking.

In trying to isolate this further i switched to using SimpleWorker. While i still see some unexplainable timeouts (300 seconds for a query that normally runs in 5 seconds) i am now getting a lot of AbandonedJobError's. The worker appears to be dying but I cannot seem to debug this further.

Here is an example of a job that had a AbandonedJobError. Whats interesting is the amount of time between the job starting and the exception being raised is significantly more then the job_timeout of 300.

2024-02-08 20:27:07,931 - Redis_Worker - DEBUG - WorkerID: 578b98bbc13f4a4cbe781983b7c56787 - Dequeued job a8a7cb8f-e716-473b-8dd6-74add3a79a4b from low
2024-02-08 20:27:07,931 - Redis_Worker - INFO - WorkerID: 578b98bbc13f4a4cbe781983b7c56787 - low: stage_ingest_workers.sync_sap_duty() (a8a7cb8f-e716-473b-8dd6-74add3a79a4)
2024-02-08 20:27:07,932 - Redis_Worker - DEBUG - WorkerID: 578b98bbc13f4a4cbe781983b7c56787 - Preparing for execution of Job ID a8a7cb8f-e716-473b-8dd6-74add3a79a4b
2024-02-08 20:43:05,275 - Redis_Worker - DEBUG - WorkerID: 578b98bbc13f4a4cbe781983b7c56787 - Job a8a7cb8f-e716-473b-8dd6-74add3a79a4b raised an exception.
2024-02-08 20:43:05,315 - Redis_Worker - DEBUG - WorkerID: 578b98bbc13f4a4cbe781983b7c56787 - Handling failed execution of job a8a7cb8f-e716-473b-8dd6-74add3a79a4b
2024-02-08 20:43:05,317 - Redis_Worker - DEBUG - WorkerID: 578b98bbc13f4a4cbe781983b7c56787 - Handling exception for a8a7cb8f-e716-473b-8dd6-74add3a79a4b.
2024-02-08 20:43:05,319 - Redis_Worker - ERROR - WorkerID: 578b98bbc13f4a4cbe781983b7c56787 - [Job a8a7cb8f-e716-473b-8dd6-74add3a79a4b]: exception raised while executing (stage_ingest_workers.sync_sap_duty)

I also occasionally see this complaining about my on_failure callback when running rq info and when using SimpleWorker. This is perplexing as the worker and job modules are reside in the same folder.

Traceback (most recent call last):
  File "/srv/venv/py3113/lib64/python3.11/site-packages/rq/utils.py", line 107, in import_attribute
    return __builtins__[name]
           ~~~~~~~~~~~~^^^^^^
KeyError: 'stage_ingest_logging.on_failure_callback'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/venv/py3113/bin/rqinfo", line 8, in <module>
    sys.exit(info())
             ^^^^^^
  File "/srv/venv/py3113/lib64/python3.11/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/venv/py3113/lib64/python3.11/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/srv/venv/py3113/lib64/python3.11/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/venv/py3113/lib64/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/venv/py3113/lib64/python3.11/site-packages/rq/cli/helpers.py", line 422, in wrapper
    return ctx.invoke(func, cli_config, *args[1:], **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/venv/py3113/lib64/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/venv/py3113/lib64/python3.11/site-packages/rq/cli/cli.py", line 148, in info
    clean_registries(queue)
  File "/srv/venv/py3113/lib64/python3.11/site-packages/rq/registry.py", line 454, in clean_registries
    registry.cleanup()
  File "/srv/venv/py3113/lib64/python3.11/site-packages/rq/registry.py", line 238, in cleanup
    job.execute_failure_callback(
  File "/srv/venv/py3113/lib64/python3.11/site-packages/rq/job.py", line 1423, in execute_failure_callback
    if not self.failure_callback:
           ^^^^^^^^^^^^^^^^^^^^^
  File "/srv/venv/py3113/lib64/python3.11/site-packages/rq/job.py", line 448, in failure_callback
    self._failure_callback = import_attribute(self._failure_callback_name)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/venv/py3113/lib64/python3.11/site-packages/rq/utils.py", line 109, in import_attribute
    raise ValueError('Invalid attribute name: %s' % name)
ValueError: Invalid attribute name: stage_ingest_logging.on_failure_callback

The worker definitely seems to be dying as it disappears from rqinfo and dosent restart until manually restarting my systemd daemon. The code/infrastructure/jobs are all the same between using Worker and SimpleWorker. Any advice/direction would be appreciated.

micmizer · 2024-02-08T22:50:24Z

micmizer
Feb 8, 2024
Author

Couple other things observed since posting this:

Occasionally rqinfo will look like the following (None None None). I have never observed this with Worker:

c3e57393b6cc419c9149ee93a42b1e11 (None None None): busy . jobs: 12 finished, 0 failed

Here is an example of a job that hit the job_timeout but appears to still have finished after the fact:

2024-02-08 22:30:54,881 - Redis_Worker - DEBUG - WorkerID: d36b1bcf19bb4cd98d4323831b40e4f7 - Dequeued job 8ce0fc3d-0c12-4734-868f-567af984a10d from low
2024-02-08 22:30:54,881 - Redis_Worker - INFO - WorkerID: d36b1bcf19bb4cd98d4323831b40e4f7 - low: stage_ingest_workers.sync_escalated_status() (8ce0fc3d-0c12-4734-868f-567af984a10d)
2024-02-08 22:30:54,882 - Redis_Worker - INFO - WorkerID: d36b1bcf19bb4cd98d4323831b40e4f7 - Starting job 8ce0fc3d-0c12-4734-868f-567af984a10d on queue low
2024-02-08 22:30:54,883 - Redis_Worker - DEBUG - WorkerID: d36b1bcf19bb4cd98d4323831b40e4f7 - Preparing for execution of Job ID 8ce0fc3d-0c12-4734-868f-567af984a10d
2024-02-08 22:35:54,891 - Redis_Worker - DEBUG - WorkerID: d36b1bcf19bb4cd98d4323831b40e4f7 - Job 8ce0fc3d-0c12-4734-868f-567af984a10d raised an exception.
2024-02-08 22:35:54,926 - Redis_Worker - DEBUG - WorkerID: d36b1bcf19bb4cd98d4323831b40e4f7 - Handling failed execution of job 8ce0fc3d-0c12-4734-868f-567af984a10d
2024-02-08 22:35:54,927 - Redis_Worker - DEBUG - WorkerID: d36b1bcf19bb4cd98d4323831b40e4f7 - Handling exception for 8ce0fc3d-0c12-4734-868f-567af984a10d.
2024-02-08 22:35:54,930 - Redis_Worker - ERROR - WorkerID: d36b1bcf19bb4cd98d4323831b40e4f7 - [Job 8ce0fc3d-0c12-4734-868f-567af984a10d]: exception raised while executing (stage_ingest_workers.sync_escalated_status)
2024-02-08 22:35:54,930 - Redis_Worker - INFO - WorkerID: d36b1bcf19bb4cd98d4323831b40e4f7 - Finished job 8ce0fc3d-0c12-4734-868f-567af984a10d on queue low

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Odd behavior with SimpleWorker #2037

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Odd behavior with SimpleWorker #2037

micmizer Feb 8, 2024

Replies: 1 comment

micmizer Feb 8, 2024 Author

micmizer
Feb 8, 2024

micmizer
Feb 8, 2024
Author