provide task_attempt table #50

maxcountryman · 2024-11-07T15:51:58Z

This reworks how task executions are tracked by providing a new table, task_attempt which is a log of each dequeue of a given task from the queue. Put another way, each time a task is executed, a row is inserted into the attempts table and the result of that execution is stored on that row.

As such, the task table is now a container for the fixed metadata of a specific enqueue of a task. This means that metadata associated with an attempt now lives on the task_attempt table. For example, error messages are stored on the attempts table and likewise creation and completion timestamps also live on this table.

Importantly some fields of the task row have been removed as they are now derived from the attempts log. For instance, task_attempt provides an updated_at column which allows for e.g. determining succeeded_at or failed_at timestamps of the task enqueue. Similarly retry_count is a function of the number of attempt rows associated to a given task.

Furthermore, the retry policy is now embedded as a JSONB column on the task row.

Related discussion: #17

maxcountryman · 2024-11-07T15:54:14Z

migrations/20241105164503_2.sql

+    state           underway.task_state not null default 'pending',
+
+    -- Error metatdata.
+    error_message   text,


I considered adding a error_details JSONB column, but it's not clear what would live there. For example, std::error::Error is the limit of the API surface we have and at most it seems sources could be interest but it's unclear if that's actually valuable and could be added later.

in clojure such things where simplier to handle cause one can just throw exception which contains any object
so it allows to have meta data on error itself (mostly related to domain specific errors, like Order not in current state)
Maybe indeed there is no much data to save

I think e.g. anyhow, etc provide ways of enriching errors (e.g. they provide context methods) and that's potentially additional metadata that could be stored.

However, I don't like the idea of using anyhow here: we have a concrete error type (task::Error) and so we can directly augment it if we want. That said, I don't know what additional metadata we'd want. Most of it will be provided by the Display impl I think and so to_string is already giving us that.

I'm certainly open to increasing the fidelity here, so if you have ideas about more context we could embed I'll be happy to explore that.

it feels like it is nice to have more info about error, but i cant figure out what to put there

kirillsalykin · 2024-11-07T15:57:17Z

migrations/20241105164503_2.sql

+
+create table underway.task_attempt (
+    task_id         uuid not null,
+    task_queue_name text not null,


i thought that task always run in the same queue?
so i think this is task related field. Am i missing something?

You're right. The reason it's included is because task has a composite key and the queue name is therefore required to create the fk.

ah, oh
i was sure task_id is uniq...

kirillsalykin · 2024-11-07T15:58:27Z

migrations/20241105164503_2.sql

+    attempt_number  integer not null,
+
+    -- Task state.
+    state           underway.task_state not null default 'pending',


small one, dont think it can be pending, I would expect that attemp inserted once it happen, so it is failure or success

I agree "pending" is probably not right. The attempt is created upon dequeue (this is when we actually start the attempt). So this should likely be "in_progress" as a reasonable default.

in my implementation i didnt bother to have in_progress, I've inserted only finished attempts (failed or succeded)

I thought about that too but I think I prefer it this way for instrumentation purposes: by creating the row when we dequeue we'll be able to derive more accurate metrics about task attempts (i.e. it'll incorporate the time it took to execute the task).

It's also a little easier reason about derived properties knowing that the attempt row will always exist once it's been dequeued.

I am under impression that because of tx nature - no one will see this attempt in in_progress state.
Please correct me if I am wrong

The queue itself doesn't require a transaction (e.g. dequeue takes a PgConnection). The provided worker implementation does use a transaction, so in that case it should be true.

Holding a transaction open for the duration of task execution is probably okay most of the time but I want to retain the possibility for extending the crate with workers that might not work that way.

Even if we don't end up providing that right away, creating attempt rows in tandem with a dequeue is beneficial in a least a couple of ways:

Encapsulating the state transition in the queue itself is an arguably more sound design (it means a caller doesn't have to remember to e.g. create an attempt row later, or even know about attempt rows at all, and keeps the overall design more coherent in my view) and,

Because attempt rows are created during the dequeue, their timing metrics will be better aligned with the moment a task was dequeued--this could be done manually, but I'm not really sure what we gain by doing that.

If later we're positive that we'll never support out-of-transaction workers I think we could still make a change that constrains the states and removes "in progress" so this isn't necessarily set in stone.

kirillsalykin · 2024-11-07T15:59:11Z

migrations/20241105164503_2.sql

+    -- Error metatdata.
+    error_message   text,
+
+    created_at      timestamp with time zone not null default now(),


I think having started_at/completed_at allows to catch when particial attempt happen.

Correct me if I'm wrong but I think those are always derivable:

started_at is always created_at (since an attempt is created upon dequeue)

completed_at is updated_at when state is "succeeded" or "failed"

We could rename created_at if that's more clear and technically we could set a separate completed_at column to be sure updated_at isn't overloaded unnecessarily.

you are correct, my bad

Do you think it's worth having a separate completed_at column? I can't immediately think of use cases for changing updated_at after an attempt is completed, but it would ensure that the semantic isn't conflated.

I would prefer started_at/created_at

Great I'll update that too: I think this is started_at and completed_at, correct?

I've added this now.

kirillsalykin · 2024-11-07T16:01:55Z

migrations/20241105164503_2.sql

+drop column if exists last_failed_at;
+
+alter table underway.task
+add column if not exists retry_policy jsonb not null default '{"max_attempts": 5, "initial_interval_ms": 1000, "max_interval_ms": 60000, "backoff_coefficient": 2.0}';


I am not big fan of json, knowning the shape of data seems helps to reason about things better.
I do understand that this is more work and maybe doesnt worth it, but wdyt about having custom type here?

Can you go into more detail? What does a custom type entail? Is that a separate table or something else SQLx supports?

i think it is only a psql feature,
something like this:

CREATE TYPE book AS (name varchar(255), author varchar(255), rating varchar(5)); CREATE TABLE users ( id SERIAL, name VARCHAR(255), email VARCHAR(100), books book );

Aha, I didn't realize this was possible. Yes, I'll look at implement it that way. Thank you.

but again, i am not sure its worth tbh.
i now think probably it is not...

I think it's not too difficult, something like this:

-- Define retry policies as their own type create type underway.task_retry_policy as ( max_attempts int, initial_interval_ms int, max_interval_ms int, backoff_coefficient float ); alter table underway.task add column if not exists retry_policy underway.task_retry_policy not null default row(5, 1000, 60000, 2.0)::underway.task_retry_policy;

And we can derive the type with SQLx:

#[derive(Debug, Clone, Copy, PartialEq, sqlx::Type)] #[sqlx(type_name = "task_retry_policy")] pub struct RetryPolicy { pub(crate) max_attempts: i32, pub(crate) initial_interval_ms: i32, pub(crate) max_interval_ms: i32, pub(crate) backoff_coefficient: f32, }

This is added. It was pretty straightforward to do. I'm not sure if there are downsides I might be overlooking here but it seems nice to create the custom type as you suggested.

maxcountryman · 2024-11-07T16:38:03Z

I should point out that I have not added tests here that specifically target task attempt rows. If we agree this is the right direction, I will plan to add those before we release this.

This reworks how task executions are tracked by providing a new table, `task_attempt` which is a log of each dequeue of a given task from the queue. Put another way, each time a task is executed, a row is inserted into the attempts table and the result of that execution is stored on that row. As such, the `task` table is now a container for the fixed metadata of a specific enqueue of a task. This means that metadata associated with an attempt now lives on the `task_attempt` table. For example, error messages are stored on the attempts table and likewise creation and completion timestamps also live on this table. Importantly some fields of the `task` row have been removed as they are now derived from the attempts log. For instance, `task_attempt` provides an `updated_at` column which allows for e.g. determining `succeeded_at` or `failed_at` timestamps of the task enqueue. Similarly `retry_count` is a function of the number of attempt rows associated to a given task. Furthermore, the retry policy is now embedded as a JSONB column on the task row. Related discussion: #17

Here we rename `DequeuedTask` to `InProgressTask` and rework it to fully encapsulate state transitions for "in-progress" tasks. This helps ensure that state transitions are both more ergonomic and correct.

maxcountryman · 2024-11-09T13:17:50Z

This is ready for review.

Testing could still be improved but I've added cases for the state transitions that check attempt rows are being created as we'd expect.

kirillsalykin · 2024-11-09T14:27:50Z

it looks good to me

kirillsalykin · 2024-11-09T14:28:20Z

Maybe I'll have more comments once I start using it

maxcountryman · 2024-11-09T14:41:48Z

@kirillsalykin thanks for all your feedback thus far: very much appreciate it. 🙏

maxcountryman commented Nov 7, 2024

View reviewed changes

kirillsalykin reviewed Nov 7, 2024

View reviewed changes

maxcountryman added 7 commits November 7, 2024 15:26

attempts to should to "in_progress" state

b5e2aa0

use signed duration instead of span

e32e407

apply clippy fixes

3b12c1f

additional clippy fixes

ee661f1

ensure rescheduling behaves correctly

33c2e36

use postgres type for retry policy

c07a221

maxcountryman force-pushed the task-attempts-table branch from c34ff9e to c07a221 Compare November 8, 2024 18:07

provide additional tests

85ec5fd

maxcountryman force-pushed the task-attempts-table branch from b76790d to 85ec5fd Compare November 8, 2024 18:55

maxcountryman added 2 commits November 8, 2024 11:42

check attempt row states

cd384e8

rename created_at to started_at and add completed_at

c5cf7fe

maxcountryman force-pushed the task-attempts-table branch 2 times, most recently from bdb9c73 to d367106 Compare November 9, 2024 04:38

rework dequeued task

2a8af3e

Here we rename `DequeuedTask` to `InProgressTask` and rework it to fully encapsulate state transitions for "in-progress" tasks. This helps ensure that state transitions are both more ergonomic and correct.

maxcountryman force-pushed the task-attempts-table branch from d367106 to 2a8af3e Compare November 9, 2024 04:39

maxcountryman marked this pull request as ready for review November 9, 2024 13:15

maxcountryman merged commit b2a1bd4 into main Nov 9, 2024
4 checks passed

maxcountryman deleted the task-attempts-table branch November 9, 2024 17:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

provide task_attempt table #50

provide task_attempt table #50

maxcountryman commented Nov 7, 2024

maxcountryman Nov 7, 2024

kirillsalykin Nov 7, 2024

maxcountryman Nov 7, 2024

kirillsalykin Nov 9, 2024

kirillsalykin Nov 7, 2024

maxcountryman Nov 7, 2024

kirillsalykin Nov 7, 2024

kirillsalykin Nov 7, 2024

maxcountryman Nov 7, 2024

kirillsalykin Nov 7, 2024

maxcountryman Nov 7, 2024

kirillsalykin Nov 9, 2024

maxcountryman Nov 9, 2024

kirillsalykin Nov 7, 2024

maxcountryman Nov 7, 2024

kirillsalykin Nov 7, 2024

maxcountryman Nov 7, 2024

kirillsalykin Nov 8, 2024

maxcountryman Nov 8, 2024

maxcountryman Nov 9, 2024

kirillsalykin Nov 7, 2024

maxcountryman Nov 7, 2024

kirillsalykin Nov 8, 2024

maxcountryman Nov 8, 2024

kirillsalykin Nov 8, 2024

maxcountryman Nov 8, 2024

maxcountryman Nov 9, 2024

maxcountryman commented Nov 7, 2024

maxcountryman commented Nov 9, 2024

kirillsalykin commented Nov 9, 2024

kirillsalykin commented Nov 9, 2024

maxcountryman commented Nov 9, 2024

provide task_attempt table #50

provide task_attempt table #50

Conversation

maxcountryman commented Nov 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxcountryman commented Nov 7, 2024

maxcountryman commented Nov 9, 2024

kirillsalykin commented Nov 9, 2024

kirillsalykin commented Nov 9, 2024

maxcountryman commented Nov 9, 2024