-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provide task_attempt table #50
Conversation
state underway.task_state not null default 'pending', | ||
|
||
-- Error metatdata. | ||
error_message text, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered adding a error_details
JSONB column, but it's not clear what would live there. For example, std::error::Error
is the limit of the API surface we have and at most it seems sources
could be interest but it's unclear if that's actually valuable and could be added later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in clojure such things where simplier to handle cause one can just throw exception which contains any object
so it allows to have meta data on error itself (mostly related to domain specific errors, like Order not in current state
)
Maybe indeed there is no much data to save
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think e.g. anyhow
, etc provide ways of enriching errors (e.g. they provide context
methods) and that's potentially additional metadata that could be stored.
However, I don't like the idea of using anyhow
here: we have a concrete error type (task::Error
) and so we can directly augment it if we want. That said, I don't know what additional metadata we'd want. Most of it will be provided by the Display
impl I think and so to_string
is already giving us that.
I'm certainly open to increasing the fidelity here, so if you have ideas about more context we could embed I'll be happy to explore that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it feels like it is nice to have more info about error, but i cant figure out what to put there
|
||
create table underway.task_attempt ( | ||
task_id uuid not null, | ||
task_queue_name text not null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought that task always run in the same queue?
so i think this is task
related field. Am i missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. The reason it's included is because task
has a composite key and the queue name is therefore required to create the fk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, oh
i was sure task_id
is uniq...
migrations/20241105164503_2.sql
Outdated
attempt_number integer not null, | ||
|
||
-- Task state. | ||
state underway.task_state not null default 'pending', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small one, dont think it can be pending
, I would expect that attemp inserted once it happen, so it is failure or success
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree "pending" is probably not right. The attempt is created upon dequeue (this is when we actually start the attempt). So this should likely be "in_progress" as a reasonable default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in my implementation i didnt bother to have in_progress
, I've inserted only finished attempts (failed or succeded)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that too but I think I prefer it this way for instrumentation purposes: by creating the row when we dequeue we'll be able to derive more accurate metrics about task attempts (i.e. it'll incorporate the time it took to execute the task).
It's also a little easier reason about derived properties knowing that the attempt row will always exist once it's been dequeued.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am under impression that because of tx nature - no one will see this attempt in in_progress
state.
Please correct me if I am wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The queue itself doesn't require a transaction (e.g. dequeue
takes a PgConnection
). The provided worker implementation does use a transaction, so in that case it should be true.
Holding a transaction open for the duration of task execution is probably okay most of the time but I want to retain the possibility for extending the crate with workers that might not work that way.
Even if we don't end up providing that right away, creating attempt rows in tandem with a dequeue is beneficial in a least a couple of ways:
- Encapsulating the state transition in the queue itself is an arguably more sound design (it means a caller doesn't have to remember to e.g. create an attempt row later, or even know about attempt rows at all, and keeps the overall design more coherent in my view) and,
- Because attempt rows are created during the dequeue, their timing metrics will be better aligned with the moment a task was dequeued--this could be done manually, but I'm not really sure what we gain by doing that.
If later we're positive that we'll never support out-of-transaction workers I think we could still make a change that constrains the states and removes "in progress" so this isn't necessarily set in stone.
migrations/20241105164503_2.sql
Outdated
-- Error metatdata. | ||
error_message text, | ||
|
||
created_at timestamp with time zone not null default now(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having started_at
/completed_at
allows to catch when particial attempt happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong but I think those are always derivable:
started_at
is alwayscreated_at
(since an attempt is created upon dequeue)completed_at
isupdated_at
when state is "succeeded" or "failed"
We could rename created_at
if that's more clear and technically we could set a separate completed_at
column to be sure updated_at
isn't overloaded unnecessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are correct, my bad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's worth having a separate completed_at
column? I can't immediately think of use cases for changing updated_at
after an attempt is completed, but it would ensure that the semantic isn't conflated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer started_at/created_at
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great I'll update that too: I think this is started_at
and completed_at
, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this now.
migrations/20241105164503_2.sql
Outdated
drop column if exists last_failed_at; | ||
|
||
alter table underway.task | ||
add column if not exists retry_policy jsonb not null default '{"max_attempts": 5, "initial_interval_ms": 1000, "max_interval_ms": 60000, "backoff_coefficient": 2.0}'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not big fan of json, knowning the shape of data seems helps to reason about things better.
I do understand that this is more work and maybe doesnt worth it, but wdyt about having custom type here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you go into more detail? What does a custom type entail? Is that a separate table or something else SQLx supports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it is only a psql feature,
something like this:
CREATE TYPE book AS (name varchar(255), author varchar(255), rating varchar(5));
CREATE TABLE users
(
id SERIAL,
name VARCHAR(255),
email VARCHAR(100),
books book
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I didn't realize this was possible. Yes, I'll look at implement it that way. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but again, i am not sure its worth tbh.
i now think probably it is not...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not too difficult, something like this:
-- Define retry policies as their own type
create type underway.task_retry_policy as (
max_attempts int,
initial_interval_ms int,
max_interval_ms int,
backoff_coefficient float
);
alter table underway.task
add column if not exists retry_policy underway.task_retry_policy not null
default row(5, 1000, 60000, 2.0)::underway.task_retry_policy;
And we can derive the type with SQLx:
#[derive(Debug, Clone, Copy, PartialEq, sqlx::Type)]
#[sqlx(type_name = "task_retry_policy")]
pub struct RetryPolicy {
pub(crate) max_attempts: i32,
pub(crate) initial_interval_ms: i32,
pub(crate) max_interval_ms: i32,
pub(crate) backoff_coefficient: f32,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is added. It was pretty straightforward to do. I'm not sure if there are downsides I might be overlooking here but it seems nice to create the custom type as you suggested.
I should point out that I have not added tests here that specifically target task attempt rows. If we agree this is the right direction, I will plan to add those before we release this. |
This reworks how task executions are tracked by providing a new table, `task_attempt` which is a log of each dequeue of a given task from the queue. Put another way, each time a task is executed, a row is inserted into the attempts table and the result of that execution is stored on that row. As such, the `task` table is now a container for the fixed metadata of a specific enqueue of a task. This means that metadata associated with an attempt now lives on the `task_attempt` table. For example, error messages are stored on the attempts table and likewise creation and completion timestamps also live on this table. Importantly some fields of the `task` row have been removed as they are now derived from the attempts log. For instance, `task_attempt` provides an `updated_at` column which allows for e.g. determining `succeeded_at` or `failed_at` timestamps of the task enqueue. Similarly `retry_count` is a function of the number of attempt rows associated to a given task. Furthermore, the retry policy is now embedded as a JSONB column on the task row. Related discussion: #17
c34ff9e
to
c07a221
Compare
b76790d
to
85ec5fd
Compare
bdb9c73
to
d367106
Compare
Here we rename `DequeuedTask` to `InProgressTask` and rework it to fully encapsulate state transitions for "in-progress" tasks. This helps ensure that state transitions are both more ergonomic and correct.
d367106
to
2a8af3e
Compare
This is ready for review. Testing could still be improved but I've added cases for the state transitions that check attempt rows are being created as we'd expect. |
it looks good to me |
Maybe I'll have more comments once I start using it |
@kirillsalykin thanks for all your feedback thus far: very much appreciate it. 🙏 |
This reworks how task executions are tracked by providing a new table,
task_attempt
which is a log of each dequeue of a given task from the queue. Put another way, each time a task is executed, a row is inserted into the attempts table and the result of that execution is stored on that row.As such, the
task
table is now a container for the fixed metadata of a specific enqueue of a task. This means that metadata associated with an attempt now lives on thetask_attempt
table. For example, error messages are stored on the attempts table and likewise creation and completion timestamps also live on this table.Importantly some fields of the
task
row have been removed as they are now derived from the attempts log. For instance,task_attempt
provides anupdated_at
column which allows for e.g. determiningsucceeded_at
orfailed_at
timestamps of the task enqueue. Similarlyretry_count
is a function of the number of attempt rows associated to a given task.Furthermore, the retry policy is now embedded as a JSONB column on the task row.
Related discussion: #17