Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix assets with the same backfill policy not being materialized together #21880

Merged
merged 2 commits into from
May 21, 2024

Conversation

jamiedemaria
Copy link
Contributor

@jamiedemaria jamiedemaria commented May 15, 2024

Summary & Motivation

A user reported an issue where a backfill requested a parent and child asset partitions in separate runs, causing the child asset partition to fail since the parent had not yet been materialized. After some investigation, it looks like if multiple assets have the same partitions definition, different backfill policies, and were going to be requested in one tick of the backfill daemon, we requested each asset individually instead of grouping by backfill policy. This causes parent-child asset pairs that were going to be materialized together by that tick to be requested in different runs.

This PR changes the behavior so that we group by backfill policy. I think this change should be ok because we already know the assets have the same partitions def and requested partitions, and we know that any parent/child assets must have the same backfill definition from the can_run_with_parent check earlier in the daemon loop

How I Tested These Changes

new unit test. It failed before making the code changed (there were 3 run requests instead of 2) and passes now

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @jamiedemaria and the rest of your teammates on Graphite Graphite

@jamiedemaria jamiedemaria marked this pull request as ready for review May 17, 2024 14:26
@jamiedemaria jamiedemaria requested a review from sryza May 20, 2024 16:59
@jamiedemaria
Copy link
Contributor Author

@sryza since Claire is out until tomorrow, could you take a look at this? I'd like to get the fix into this week's release if possible

asset_key
)

for backfill_policy, aks in asset_keys_by_backfill_policy.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to do all the grouping in one place? I.e. assets_to_reconcile_by_partitions_def_partition_keys_backfill_policy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good point - done!

@jamiedemaria jamiedemaria force-pushed the jamie/backfil-with-multiple-backfill-policies branch from cae3fe9 to 9db6bc7 Compare May 21, 2024 14:20
@jamiedemaria jamiedemaria requested a review from sryza May 21, 2024 14:50
Tuple[Optional[PartitionsDefinition], Optional[FrozenSet[str]]], Set[AssetKey]
assets_to_reconcile_by_partitions_def_partition_keys_backfill_policy: Mapping[
Tuple[Optional[PartitionsDefinition], Optional[FrozenSet[str]], Optional[BackfillPolicy]],
Set[AssetKey],
] = defaultdict(set)

# here we are grouping assets by their partitions def and partition keys selected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this comment or remove it?

@jamiedemaria jamiedemaria force-pushed the jamie/backfil-with-multiple-backfill-policies branch from 9db6bc7 to d6e471f Compare May 21, 2024 15:41
@jamiedemaria jamiedemaria merged commit 1a8f14c into master May 21, 2024
1 check passed
@jamiedemaria jamiedemaria deleted the jamie/backfil-with-multiple-backfill-policies branch May 21, 2024 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants