You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to better understand the quetz code base. In order to do that, I'd like to start by documenting some of the internals and design choices, starting with the topic of background job scheduling. I have compiled below what I understand so far based on docs and code.
@wolfv@btel@janjagusch (and whoever else knows more): I would be super grateful if you could help me out with corrections, additions, or any additional pointers. I'd be happy to contribute the resulting understanding into developer docs.
My current understanding of jobs/tasks/actions in quetz
There are two main ways for quetz to handle background tasks:
Mode 1: the starlette BackgroundTask API is used. This mode has nothing to do with the "worker" settings in the quetz config.
Scheduling:
route functions in main.py depend on fastapi.BackgroundTasks .
In order to schedule a task for execution, it is appended to the list
The Supervisor uses the worker section of the configuration in order to decide how to execute the job.
If worker=="thread" (the default), the job is executed in a thread pool
If worker=="subprocess", the job is executed as a subprocess of the supervisor process
If worker=="redis", then the job is not executed by the supervisor directly, but is sent to the redis queue.
Some questions
Is the above summary correct?
Why does mode 1 exist?
AFAICT, it is only used to run indexing.update_indexes , which is also available for running in mode 2. Since mode 2 is more flexible in configuring how tasks should be executed, it seems sensible to use mode 2 everywhere.
Is anyone currently successfully using mode 2 with worker=="redis" in a real-life setup?
When I try to set it up, I cannot seem to make it work because quetz.config.Config is not (un-)pickleable.
Why is quetz.Config designed the way it is? There are a few things going on here:
The design uses a singleton-ish pattern implemented through __new__. What is the motivation for this approach?
There also seems to be some support for searching for config files and possibly dynamically combining multiple config sources. What requirements make this necessary? Are there requirements that would prevent us from replacing the current Config with something super straightforward like pydantic.BaseSettings?
The text was updated successfully, but these errors were encountered:
Context
Hi!
I'm trying to better understand the quetz code base. In order to do that, I'd like to start by documenting some of the internals and design choices, starting with the topic of background job scheduling. I have compiled below what I understand so far based on docs and code.
@wolfv @btel @janjagusch (and whoever else knows more): I would be super grateful if you could help me out with corrections, additions, or any additional pointers. I'd be happy to contribute the resulting understanding into developer docs.
My current understanding of jobs/tasks/actions in quetz
There are two main ways for quetz to handle background tasks:
indexing.update_indexes
in a number of route functions:delete_package
delete_package_version
post_file_to_package
post_upload
post_file_to_channel
get_tasks_worker
, which returns aTask
instance, which exposesexecute_channel_action
execute_channel_action
allows to schedule one-offs as well as repeating scheduled jobs.execute_channel_action
does not execute anything, but writes the job definition to the DBcli.run
/cli.start
.worker
section of the configuration in order to decide how to execute the job.worker=="thread"
(the default), the job is executed in a thread poolworker=="subprocess"
, the job is executed as a subprocess of the supervisor processworker=="redis"
, then the job is not executed by the supervisor directly, but is sent to the redis queue.Some questions
indexing.update_indexes
, which is also available for running in mode 2. Since mode 2 is more flexible in configuring how tasks should be executed, it seems sensible to use mode 2 everywhere.worker=="redis"
in a real-life setup?quetz.config.Config
is not (un-)pickleable.quetz.Config
designed the way it is? There are a few things going on here:__new__
. What is the motivation for this approach?Config
with something super straightforward likepydantic.BaseSettings
?The text was updated successfully, but these errors were encountered: