-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hashing a Task to include its dependencies #23
Comments
Thanks for a great question.
So far, we've erred on the side of caution about trying to infer all the code reachable from a task. Currently, redun only hashes the source code of the task itself. You can also use manual versioning ( We have thought about how to let users opt-in other plain python functions into the task hashing, similar to the idea you give. One design is to do something like this: def log_transform(s: pd.Series):
return np.log(s + 1)
@task(hash_includes=[log_transform])
def transform_features(df: pd.DataFrame) -> pd.DataFrame:
for c in df.columns:
df[c] = log_transform(df[c])
return df
I like your idea about using static analysis to find reachable functions, but constraining it to user code via I'll update this ticket when we have |
Are there any updates here? This would be very useful. We're running into a similar situation where we update sub-tasks and want the scheduler to descend the DAG, get to the changed nodes and pick up execution from there. |
Let's say I have the following code:
Where
log_transform
is a function whichsrc/utils.py
Now suppose I realize that this function is not very smart, and I want to update it to
Which potentially changes multiple
task
s which depend on this code.My question is the following: would it be a good idea to hash all the internal source code invoked by a
task
? Would it be possible? If it's not a good idea / it's impossible, what best practices do people follow to avoid such situations?I can imagine it must be a common problem, unless all the code one writes is wrapped with
task
decorators?Edit:
of course one would probably want to check only the user-defined functions e.g.
The text was updated successfully, but these errors were encountered: