-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Housekeeping] Flyteadmin digest comparison should rely on database semantics #4780
Comments
🤖 It's great to see your thoughtful analysis of the issue and proposed solution! Your suggestion to wrap the comparison in a transaction for better reliability makes a lot of sense. If you're able to implement this improvement and create a pull request, it would be a valuable contribution to the repository. Thank you for considering this enhancement! This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
#take |
Hi @katrogan , I want to clarify the issue before going forward. While creating task, flyte admin will check whether the id and taskDigest of new task is identical to existing task in db. Is this issue aiming to wrap this section of code in transaction, in order to prevent others precess possibly update the data of existing task while doing the taskDigest check? |
yes exactly, we could have two users attempt to register the same task version in the same time frame such that an existing task digest is never found but only one succeeds in writing to the db |
Hi @katrogan , please let me clarify this issue further. In this section of code, Flyte will check if identical task existed in the DB. If not, Flyte will create new task in DB, otherwise return error. |
the way we get around duplicates in the two separate transactions is because we want to guard against two tasks with the same identifier (project, domain, name, version) being written to the db with different structures (which would result in different digests) the database ensures we can never write duplicate entries with identical primary keys (in this case, TaskKey) in your example above, A and B cannot execute in parallel due to transaction semantics, so one will fail when there is a conflict with the primary key. if we move the digest comparison to within a transaction we can ensure we correctly returns already exists (with or without a different task structure) so the pseudocode could look like
the current code does return the unmarshalled closure of the existing task when we return |
Hi @katrogan , thanks for the detailed explanation. I think I understand the purpose of wrapping digest check within a transaction right now. Just want to clarify with you once again. If the transaction checks the primary key on create task first, and check digest if task of the same primary key found in DB, we can make sure that the digest will be checked even though two identical tasks registered in the same timeframe. Moreover, I am not sure about what you mentioned about |
great, that diagram looks like we're on the same page! about |
Describe the issue
In order to compare an already registered task (and workflow, and launch plan) when an identical version is registered, we get the existing task from the db and manually compare its digest. This is prone to race conditions. It would be better to wrap this in a transaction where we attempt to upsert the task only on a matching digest
What if we do not do this?
Slightly more error prone method for determining already exists task structure
Related component(s)
flyteadmin
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: