Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker refactor #933

Open
ewjoachim opened this issue Feb 9, 2024 · 0 comments
Open

Worker refactor #933

ewjoachim opened this issue Feb 9, 2024 · 0 comments
Labels
Issue appropriate for: People up for a challenge 🤨 This issue probably will be challenging to tackle Issue contains: Exploration & Design decisions 🤯 We don't know how this will be implemented yet Issue contains: Some Python 🐍 This issue involves writing some Python code Issue type: Refactor 🔄 Issues involves rewriting some code in a better way

Comments

@ewjoachim
Copy link
Member

ewjoachim commented Feb 9, 2024

Current sitation

When the procrastinate worker starts, it will launch n+2 coroutines, n being the concurrency setting.

  • n coroutines will be "subworkers" that loop while the worker isn't requested to stop. Because the stop request may happen at any time, the loop itself is a bit complicated so as to ensure that we stop asap. On each loop, it will wait for a listen/notify event, request a job, and if it receives one, run it, then loop.
  • 1 coroutine is the listen/notify coroutine that will use an asyncio.Event to communicate that new tasks are available.
  • 1 periodic defer coroutine will defer periodic jobs.
  • All those coroutines are handled by a horribly complicated construct (utils.run_tasks)

Issues with the current situation

Each subworker requests new tasks from the DB. That's far too many SQL queries. Also, utils.run_tasks is a bit disgusting.

Possible solutions

Instead of having n looping subworkers, we should have a single job spawner that figures out when a new job is available and then launches job coroutines withing asyncio tasks.
We should still respect concurrency, probably using asyncio.Semaphore.

At this point I'm not 100% sure yet how graceful and especially ungraceful shutdown should look like. I guess the standard "shutdown signal" for a coroutine is task.cancel(), so the first request to shutdown that triggers a graceful shutdown should use this. Note that it's already how App.run_worker_async() is implemented, but maybe this should be a part of the worker. Then, I guess a second .cancel() should shut down everything, but still ensure that cancelled job have a special state. I think "failed" is the existing corresponding state, waiting for a proper handling of withdrawn & interrupted tasks (in another ticket)

For now this all is up do design discussions.

@ewjoachim ewjoachim added Issue contains: Exploration & Design decisions 🤯 We don't know how this will be implemented yet Issue contains: Some Python 🐍 This issue involves writing some Python code Issue type: Refactor 🔄 Issues involves rewriting some code in a better way Issue appropriate for: People up for a challenge 🤨 This issue probably will be challenging to tackle labels Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue appropriate for: People up for a challenge 🤨 This issue probably will be challenging to tackle Issue contains: Exploration & Design decisions 🤯 We don't know how this will be implemented yet Issue contains: Some Python 🐍 This issue involves writing some Python code Issue type: Refactor 🔄 Issues involves rewriting some code in a better way
Projects
None yet
Development

No branches or pull requests

1 participant