Understanding internals: Jobs / tasks / actions #625

AndreasAlbertQC · 2023-04-19T21:35:16Z

Context

Hi!

I'm trying to better understand the quetz code base. In order to do that, I'd like to start by documenting some of the internals and design choices, starting with the topic of background job scheduling. I have compiled below what I understand so far based on docs and code.

@wolfv @btel @janjagusch (and whoever else knows more): I would be super grateful if you could help me out with corrections, additions, or any additional pointers. I'd be happy to contribute the resulting understanding into developer docs.

My current understanding of jobs/tasks/actions in quetz

There are two main ways for quetz to handle background tasks:

Mode 1: the starlette BackgroundTask API is used. This mode has nothing to do with the "worker" settings in the quetz config.
- Scheduling:
  - route functions in main.py depend on fastapi.BackgroundTasks .
  - In order to schedule a task for execution, it is appended to the list
- Execution:
  - Execution is implemented in starlette
  - Execution happens in the same process as the http server
  - The "worker" settings and API are not used at all.
- Where is it used:
  - This method is used to execute indexing.update_indexes in a number of route functions:
    - delete_package
    - delete_package_version
    - post_file_to_package
    - post_upload
    - post_file_to_channel
Mode 2: "channel actions". This mode uses the "worker" settings in the quetz config.
- Scheduling:
  - route functions depend on get_tasks_worker, which returns a Task instance, which exposes execute_channel_action
  - execute_channel_action allows to schedule one-offs as well as repeating scheduled jobs.
  - execute_channel_action does not execute anything, but writes the job definition to the DB
- Execution:
  - Execution is managed by the Supervisor process.
  - By default, the Supervisor is started together with the server through cli.run / cli.start.
  - Alternatively, the supervisor process may also be started separately through the cli without starting the server. This requires you to pass in the deployment directory.
  - The Supervisor uses the worker section of the configuration in order to decide how to execute the job.
    - If worker=="thread" (the default), the job is executed in a thread pool
    - If worker=="subprocess", the job is executed as a subprocess of the supervisor process
    - If worker=="redis", then the job is not executed by the supervisor directly, but is sent to the redis queue.

Some questions

Is the above summary correct?
Why does mode 1 exist?
- AFAICT, it is only used to run indexing.update_indexes , which is also available for running in mode 2. Since mode 2 is more flexible in configuring how tasks should be executed, it seems sensible to use mode 2 everywhere.
Is anyone currently successfully using mode 2 with worker=="redis" in a real-life setup?
- When I try to set it up, I cannot seem to make it work because quetz.config.Config is not (un-)pickleable.
Why is quetz.Config designed the way it is? There are a few things going on here:
- The design uses a singleton-ish pattern implemented through __new__. What is the motivation for this approach?
- There also seems to be some support for searching for config files and possibly dynamically combining multiple config sources. What requirements make this necessary? Are there requirements that would prevent us from replacing the current Config with something super straightforward like pydantic.BaseSettings?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding internals: Jobs / tasks / actions #625

Understanding internals: Jobs / tasks / actions #625

AndreasAlbertQC commented Apr 19, 2023

Understanding internals: Jobs / tasks / actions #625

Understanding internals: Jobs / tasks / actions #625

Comments

AndreasAlbertQC commented Apr 19, 2023

Context

My current understanding of jobs/tasks/actions in quetz

Some questions