POC - batched queue processing #99

gregplaysguitar · 2020-03-16T01:21:23Z

@benjie this PR updates the job retrieval and update logic so it can handle batching. The batch size is hardcoded in the get_jobs function for now (limit 1) - once the concept is proved and accepted this would become a parameter to get_jobs and the calling code as needed.

All existing tests pass with batch size set to 1, and with batch size > 1, all pass except "runs jobs in parallel" which I guess has some implicit assumptions about worker picking up a single job at a time.

Performance-wise, running the included perf test I'm seeing job throughput increase with larger batches, as expected, but the baseline performance with batch size 1 is quite a bit slower than before (~600 vs ~1800 job/s). My assumption is that non-batched performance needs to stay pretty much the same, so keen for any feedback on what might be causing this. I'll keep playing around with it.

Performance impacts still need to be quantified for a real workload, which I'm planning to do in our app soon.

Let me know what you think, and in particular whether you think this is something you're keen to support in worker - if not, that's ok, we may end up maintaining a fork if we decide we definitely need batching.

TODO

Parallel job processing
plain SQL pg functions POC - batched queue processing #99 (comment) POC - batched queue processing #99 (comment)
configurable batch size POC - batched queue processing #99 (comment)
failures custom type POC - batched queue processing #99 (comment)
decouple batched completion from input batches POC - batched queue processing #99 (review) https://github.com/graphile/worker/pull/73/files
deprecate get_job POC - batched queue processing #99 (comment)
Allow batches to be retrieved before their run_at, to facilitate an SQS buffer which can have up to 15min delay

Master branch is getting ~2150 jobs/s for me The original batched get_jobs was performing really badly at ~680 jobs/s. Using a composite type instead of json internally increased that to ~840 Replacing the select array … into with a CTE got it right back up to ~1840 jobs/s at batch size 1. With batch size 4 that jumps to ~4400.

gregplaysguitar · 2020-03-16T03:26:27Z

Update: I made some big perf improvements, see details in 863184c

benjie · 2020-03-18T12:34:22Z

Definitely interesting; I'm going to want to pull this down myself and have a play with it. In the mean time could you look at the changes in #73 and consider (but don't necessarily do it) putting them into this. In particular I think releasing the jobs in batches (e.g. once per second rather than every time a job completes) could make a difference although the performance testing I did on that before did not show this to be the case (I think I did something wrong in the perf tests). In particular releasing jobs in a batch should be independent from the jobs that were checked out (i.e. different batches).

Excellent work, and well done for noticing the issue with json_build_object being evaluated by the rows before filtering, this is an issue we suffer in PostGraphile from time to time also.

gregplaysguitar · 2020-03-19T23:46:43Z

Thanks @benjie - I've had a look at #73, made a couple of comments there.

Batched job completion does seem like it would help. In my observation though, it's the get_job query that seems to be causing the bulk of the load when the system is under strain, so that's why I looked into batching that. But combining both ideas sounds like a good plan.

I think we would want to consolidate some APIs, there are a few different ways of doing things floating around now, e.g. complete_batch vs complete_jobs & fail_jobs - keen to take a steer from you on that as I have less context around different use-cases. If you do want to go ahead with this then perhaps it would be worth documenting the interfaces before we do any further work? I'd be happy to review

For my part I will be doing some deeper perf analysis (i.e. production-scale load testing) at some point soon and can include these changes as well as #73 in that.

benjie · 2020-03-20T00:05:37Z

Can you prove complete_batch is significantly faster than calling the complete_job / fail_job functions? If not, I’d drop it (it’s why I never merged #73).

benjie · 2020-03-20T00:11:16Z

As for get_jobs... what if we were to drop the DB function and inline it into the JS as a prepared statement? I don’t think we need get_job either. TBH only add_job is needed - it’s the only part of our public interface, everything else is internal. Perhaps prepared statements rather than function calls would perform better.

gregplaysguitar · 2020-03-20T01:42:15Z

Can you prove complete_batch is significantly faster than calling the complete_job / fail_job functions?

Sure, I'll try that next time I'm working on this and let you know. I'm not sure it will be any different though, and I reckon having separate is probably more efficient given failing jobs should be a rare case. Also having them separate means we don't need to use plpgsql. I'd tend towards your implementation at this stage

Perhaps prepared statements rather than function calls would perform better.

Yes this might be true, but could we achieve the same thing by just making it a sql function instead of plpgsql? If we remove the worker_id check at the top then get_jobs becomes a single SQL statement with CTEs

benjie · 2020-03-20T09:28:36Z

If we remove the worker_id check at the top then get_jobs becomes a single SQL statement with CTEs

Yeah, I considered this; I think for this to be viable we should make it STRICT, and given we do so the default for task_identifiers should be an empty array rather than NULL, and thus the checks should be updated such that an empty array counts as "all identifiers". I'm not super comfortable with this though.

Actually, having given it more thought, lets just change it to SQL, drop that check, and add a CHECK constraint to the tables that asserts that (locked_by is null) = (locked_at is null). I don't think the CHECK will add significant performance cost; and besides it'd be write cost rather than read cost.

benjie · 2020-04-06T13:50:02Z

Hey @gregplaysguitar just checking in; how's your experimentation going?

gregplaysguitar · 2020-04-27T22:03:10Z

@benjie sorry about the lack of action here. This is still something I want to look at but priorities have shifted a bit, so haven't got time right now. If you're happy to leave this PR here I'll come back to it when I can?

benjie · 2020-04-28T07:35:07Z

Sure 👍

benjie · 2020-08-04T16:40:45Z

(Gentle nudge @gregplaysguitar)

…d-queue-processing

gregplaysguitar · 2020-09-23T09:07:33Z

@benjie I've finally been able to run some proper tests on this, and I'm seeing some good improvements with batching. I'm keen to invest the time to finish this implementation - do you have any suggestions for how to go about it, or should I just follow my nose?

benjie

I'm so excited you're working on this again Greg! Really love to squeeze every ounce of performance out of Postgres that we can!

I'm not happy with the current flushing of jobs. What seems to be happening right now is that you pull down a batch of jobs (let's say 20 jobs); then you execute these jobs, then you report back.

One of these jobs might take 5 minutes, all the rest might take 10ms. So it the process is killed 2.5 minutes in, all 20 jobs will not be released, even though 19 of them completed successfully a couple minutes ago. It also means the queue will be larger with more locked jobs, which makes finding new jobs more expensive, giving a performance impact.

Lets solve it instead by batching the release of the jobs. So each job should release as soon as it's done (success or failure), but that release can go to a queue that can be flushed every [configurable period] of time, which defaults to say 1 second. For 20 very fast tasks, 1 second should catch all of them. You can see my initial thoughts on this approach in #73

I'm also concerned about the linearity of these job executions. Say job 2 takes 5 minutes; that means that jobs 3-20 don't even start until 5 minutes later. I wonder if we should just run all the tasks pulled down in parallel; user configurable with a default of just pulling down 1 task to maintain current behaviour. This would give a different concurrency control setting (and maybe we'd deprecate the old one). What do you think?

benjie · 2020-09-23T09:18:29Z

__tests__/schema.sql

@@ -125,6 +125,49 @@ begin
 end;
 $$;
 ALTER FUNCTION graphile_worker.add_job(identifier text, payload json, queue_name text, run_at timestamp with time zone, max_attempts integer, job_key text, priority integer, flags text[]) OWNER TO graphile_worker_role;
+CREATE FUNCTION graphile_worker.complete_batch(worker_id text, success_ids bigint[], failures json[]) RETURNS void


I don't like the mixture of json and array here; I'd rather that the argument was just json and we used json_array_elements.

Suggested change

CREATE FUNCTION graphile_worker.complete_batch(worker_id text, success_ids bigint[], failures json[]) RETURNS void

CREATE FUNCTION graphile_worker.complete_batch(worker_id text, success_ids bigint[], failures json) RETURNS void

However; I note a more troubling issue. PostgreSQL's JSON supports arbitrary precision numbers in JSON (I think); however Node is limited to IEEE754's 64bit floats, which gives ~53bits of integer safety, which is not enough to cover the bigint size of Graphile Worker's PKs. We can get around this by encoding the ID as a string, but we must be very careful to do so on the calling side always. I've not finished reading the code yet so haven't checked, but if there isn't already a note on this we should add one.

I wonder if it would be better to use a composite type here for failures?

create type graphile_worker.failure_info as ( job_id bigint, failure_message text );

failures json seems a bit ambiguous to me, because we will always have an array here. I think composite type makes sense, I'll have to double check what my original reasoning was for not doing that, but it might not have been anything in particular

benjie · 2020-09-23T09:22:49Z

__tests__/schema.sql

+ ) and locked_by = worker_id;
+ end if;
+end;
+$$;


If you drop the if statements you could convert this to sql rather than plpgsql which might have an impact on performance. Maybe.

I think this is a good idea anyway, always better to use plain sql if possible, imo

benjie · 2020-09-23T09:27:21Z

__tests__/schema.sql

+ and (forbidden_flags is null or (flags ?| forbidden_flags) is not true)
+ order by priority asc, run_at asc, id asc
+ -- TODO make this a configurable value
+ limit 1


Make the job_count the first argument to get_jobs.

Also maintaining both get_job and get_jobs is likely to be a pain; lets switch worker to using get_jobs (assuming no performance cost) and replace get_job(...) with a call to get_jobs(1, ...) for legacy support.

benjie · 2020-09-23T09:30:00Z

src/worker.ts

@@ -111,7 +109,9 @@ export function makeNewWorker(

 // `doNext` cannot be executed concurrently, so we know this is safe.
 // eslint-disable-next-line require-atomic-updates
- activeJob = jobRow && jobRow.id ? jobRow : null;
+ const validJobs = jobRows.filter((r) => r.id);


benjie · 2020-09-23T09:47:24Z

@gregplaysguitar This is exciting! I suspect you're seeing good results in throughput, but potentially an increase in latency? Let's see if we can achieve a throughput increase without trading too much latency - make sure you benchmark with some slower tasks (e.g. try throwing a if (Math.random() < 0.2) {await sleep(1000)} in there) to see how the metrics change.

gregplaysguitar · 2020-09-23T20:54:04Z

Ok, sounds good, and good suggestions. I'll aim to chip away at this among my other stuff.

Re latency - my testing methodology is to gradually increase load (and hence throughput) while monitoring our key operational latency metric, which encompasses graphile tasks plus a lot of other stuff going on in the db. With batching enabled the system coped with around 10-15% more load before breaching the latency SLO. I did see a little bit more latency early with batching, as you suggested, but not enough to be an issue. Hopefully with some of the suggested improvements we can get that back down.

benjie · 2023-10-23T15:00:32Z

Hey @gregplaysguitar do you have any interest in picking this back up? We've moved a lot of the logic from the migrations into the JS (e.g. getJob is now in JS: https://github.com/graphile/worker/blob/main/src/sql/getJob.ts) so iterating should be significantly less effort than before.

gregplaysguitar added 8 commits March 16, 2020 13:58

Duplicate worker code

4adc26c

Batched job processing POC

32f0787

Return early if no jobs found (as before), update tests

04de4b5

cleanup

ba6e034

Fix dupe filter

6779565

optimise complete_batch function a little

28bc394

Tiny optimisation

251c5f1

benjie changed the base branch from master to main June 24, 2020 11:32

gregplaysguitar added 4 commits September 21, 2020 17:26

Merge branch 'main' of https://github.com/graphile/worker into batche…

3ace8a6

…d-queue-processing

Pacify prettier

8f6c6f8

Migration bump

c51f088

Update db dump

f51638f

gregplaysguitar force-pushed the batched-queue-processing branch from f3fb462 to f51638f Compare September 22, 2020 23:37

gregplaysguitar marked this pull request as draft September 23, 2020 02:32

benjie reviewed Sep 23, 2020

View reviewed changes

gregplaysguitar mentioned this pull request Feb 10, 2023

Support retrieving future jobs in advance, for moving to a short-term queue (e.g. SQS) #315

Open

6 tasks

benjie mentioned this pull request Apr 26, 2024

Batch Forwarding Jobs -> External Message Broker #461

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POC - batched queue processing #99

POC - batched queue processing #99

gregplaysguitar commented Mar 16, 2020 •

edited

gregplaysguitar commented Mar 16, 2020

benjie commented Mar 18, 2020

gregplaysguitar commented Mar 19, 2020

benjie commented Mar 20, 2020

benjie commented Mar 20, 2020

gregplaysguitar commented Mar 20, 2020

benjie commented Mar 20, 2020

benjie commented Apr 6, 2020

gregplaysguitar commented Apr 27, 2020

benjie commented Apr 28, 2020

benjie commented Aug 4, 2020

gregplaysguitar commented Sep 23, 2020

benjie left a comment

benjie Sep 23, 2020

benjie Sep 23, 2020

gregplaysguitar Sep 23, 2020

benjie Sep 23, 2020

gregplaysguitar Sep 23, 2020

benjie Sep 23, 2020

benjie Sep 23, 2020

benjie commented Sep 23, 2020

gregplaysguitar commented Sep 23, 2020

benjie commented Oct 23, 2023

	CREATE FUNCTION graphile_worker.complete_batch(worker_id text, success_ids bigint[], failures json[]) RETURNS void
	CREATE FUNCTION graphile_worker.complete_batch(worker_id text, success_ids bigint[], failures json) RETURNS void

POC - batched queue processing #99

Are you sure you want to change the base?

POC - batched queue processing #99

Conversation

gregplaysguitar commented Mar 16, 2020 • edited

TODO

gregplaysguitar commented Mar 16, 2020

benjie commented Mar 18, 2020

gregplaysguitar commented Mar 19, 2020

benjie commented Mar 20, 2020

benjie commented Mar 20, 2020

gregplaysguitar commented Mar 20, 2020

benjie commented Mar 20, 2020

benjie commented Apr 6, 2020

gregplaysguitar commented Apr 27, 2020

benjie commented Apr 28, 2020

benjie commented Aug 4, 2020

gregplaysguitar commented Sep 23, 2020

benjie left a comment

Choose a reason for hiding this comment

benjie Sep 23, 2020

Choose a reason for hiding this comment

benjie Sep 23, 2020

Choose a reason for hiding this comment

gregplaysguitar Sep 23, 2020

Choose a reason for hiding this comment

benjie Sep 23, 2020

Choose a reason for hiding this comment

gregplaysguitar Sep 23, 2020

Choose a reason for hiding this comment

benjie Sep 23, 2020

Choose a reason for hiding this comment

benjie Sep 23, 2020

Choose a reason for hiding this comment

benjie commented Sep 23, 2020

gregplaysguitar commented Sep 23, 2020

benjie commented Oct 23, 2023

gregplaysguitar commented Mar 16, 2020 •

edited