Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

queue names limited to 43 characters #1067

Open
ashleyheath opened this issue May 21, 2024 · 7 comments
Open

queue names limited to 43 characters #1067

ashleyheath opened this issue May 21, 2024 · 7 comments
Labels
Issue appropriate for: newcomers 🤩 This issue can be safely tackled by people who never worked in this project before Issue contains: Some Python 🐍 This issue involves writing some Python code Issue contains: Some SQL 🐘 This features require changing the SQL model Issue type: Feature ⭐️ Add a new feature that didn't exist before

Comments

@ashleyheath
Copy link
Contributor

It seems postgres is compiled by default with an id size of 63 bytes.

procrastinate_notify_queue() performs

PERFORM pg_notify('procrastinate_queue#' || NEW.queue_name, NEW.task_name);

that means there's a practical limit of 43 ascii characters on the queue_name.

While technically someone could build their postgres instance with a different id size in practice I expect this won't be the case and so it feels like it would be sensible to either:

  • enforce that queue values are <= 43 in length
  • skip the use of pg_notify where queue names are too long (and allow workers to pick up jobs with longer queue names exclusively through polling)

Given a UUID is 36 characters, 43 is pretty limiting if I want to create 'thing' sepecific queues (e.g. email-customer/00000000-0000-0000-0000-000000000000 so it certainly feels useful to allow the use of longer queue names but appreciate that would be a breaking change with the existing behaviour.

Keen to get thoughts from people.

@ewjoachim
Copy link
Member

ewjoachim commented May 21, 2024

Alternatively, we could use a hash instead of the actual queue name. That would leave a lot of room for various values.

@ashleyheath
Copy link
Contributor Author

That's certainly an interesting idea. I'd be worried that changing the queue name in the jobs table would hinder observability and debugging but if we could use a hash only only in the pg_notify publish and subscribe calls that might do the job nicely.

@ewjoachim
Copy link
Member

ewjoachim commented May 22, 2024

Ah yes, of course, I'm only talking about the "notification" name.

Would you like to make a PR for that ?

>>> base64.b32encode(hashlib.sha1(b"my_queue").digest()).decode().lower()
'756gnpdwk3lna3zna4toqq4eh4dhu4r6'
>>> len(_)
32

(this gives 32^32 possibilities, so 10^48, meaning you'd have roughly the same probability of collision as choosing a random atom on the planet Earth and your friend also choosing a random atom, and you've both randomly picked the same atom)

And we should log the mapping between the queue name and the hash either at the start of the worker, or when we listen (level debug).

@ewjoachim ewjoachim added Issue appropriate for: newcomers 🤩 This issue can be safely tackled by people who never worked in this project before Issue contains: Some SQL 🐘 This features require changing the SQL model Issue contains: Some Python 🐍 This issue involves writing some Python code Issue type: Feature ⭐️ Add a new feature that didn't exist before labels May 22, 2024
@ashleyheath
Copy link
Contributor Author

>>> base64.b32encode(hashlib.sha1(b"my_queue").digest()).decode().lower()
'756gnpdwk3lna3zna4toqq4eh4dhu4r6'
>>> len(_)
32

@ewjoachim did you envisage doing the hash from the python code then (as opposed to from within the SQL function)? And if so, adding a new column to procrastinate_jobs to hold the hashed queue name?

@ewjoachim
Copy link
Member

Ah I was thinking about the listen part where it's equal to do it in Python and in SQL. Yeah, of course, we need an SQL implementation of that for the notify part, so yeah, let's do that in SQL.

@ashleyheath
Copy link
Contributor Author

Doing a little digging, it seems the digest function in postgres is actually part of the separate pgcrypto library.

So the options seem to be:

  • Require people to install the pgcrypto extension and hash the queue name on LISTEN (in python) and when calling pg_notify
  • Hash the queue name in python, add a column to record the hashed queue name

I don't have strong preferences but can imagine that some users may perfer the second due to restrictions on what they can install. @ewjoachim any thoughts?

@ewjoachim
Copy link
Member

Looks like the hashtext function is available without the need for extensions. It's apparently not guaranteed to be stable with PG versions, but we don't need it to be extremely stable, nothing will break if listen/notify doesn't work during a PG migration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue appropriate for: newcomers 🤩 This issue can be safely tackled by people who never worked in this project before Issue contains: Some Python 🐍 This issue involves writing some Python code Issue contains: Some SQL 🐘 This features require changing the SQL model Issue type: Feature ⭐️ Add a new feature that didn't exist before
Projects
None yet
Development

No branches or pull requests

2 participants