Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature/flytekit] Smarter copy handling with image spec #6094

Open
2 tasks done
wild-endeavor opened this issue Dec 6, 2024 · 2 comments
Open
2 tasks done

[feature/flytekit] Smarter copy handling with image spec #6094

wild-endeavor opened this issue Dec 6, 2024 · 2 comments
Labels
bug Something isn't working flytekit FlyteKit Python related issue

Comments

@wild-endeavor
Copy link
Contributor

wild-endeavor commented Dec 6, 2024

Background

The ImageSpec class today contains a few options in its constructor - source_root, copy, and source_copy_mode. The implication of these options of course is that when flytekit builds the image spec, files are copied into the image. That in turn implies that the user probably does not want the "fast register" source code package. Even when --copy auto is specified on the command line, the user probably expects that fast-register does not happen.

This is not true today. Currently, if you register a task with an ImageSpec that has a source root, pyflyte-fast-execute is still used.

Proposal

The proposal is to change the behavior of image spec such that ImageSpec with source_root tasks are left alone. If we're already copying files into the image at build time, there shouldn't be any reason to further download a code package at run time.

UX Implications

If we do the above, the implication will be that...

  • When you have two ImageSpecs, one that has source_root (and maybe some other options) and one that does not, tasks that use the one with source_root should not be fast registered (i.e. should not have a pyflyte-fast-execute command), but tasks that use the latter should be fast registered.
  • Corollary to the above: if you run --copy all but all the tasks use ImageSpecs with source_root, then the process of fast register shouldn't actually happen (i.e. flytekit should not need to request any upload anything, should not scan any files, etc.)
  • Corollary to the above: If an image spec.yaml with a source_root is used, then uploading should not happen even with --copy all if all tasks use that image spec. But if there's a single task somewhere that uses a separately defined ImageSpec without source_root, then the main fast register process should happen.

Additional

  • If you have a raw container task, that uses an image spec, that doesn't have source_root defined, and your pyflyte register command used --copy auto or --copy all (implying that you're using the fast register construct)... should we log a warning? (reason being: you're saying you want fast register with --copy auto/all, but in the raw container case, there's no way to download files [because there might not be python]).

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@wild-endeavor wild-endeavor added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Dec 6, 2024
@thomasjpfan
Copy link
Member

thomasjpfan commented Dec 7, 2024

The proposal is to change the behavior of image spec such that ImageSpec with source_root tasks are left alone.

This works if the source_root contains the workflow code. What do we do if the source_root does not actually contain the workflow code? Concretely, lets say you have this structure:

my_wf.py
lib/utils.py

with this image spec:

# my_wf.py

image_spec = ImageSpec(source_root="lib")

And run pyflyte register --copy all my_wf.py. The container would only have utils.py and not my_wf.py.

@wild-endeavor
Copy link
Contributor Author

@thomasjpfan exactly correct yes, it would break. Doing this basically states that usage of source root moves the burden of providing the correct files in the image mostly on the user (the imagespec build process itself would still take care of building new versions when copied files/source root files change). This is why this would be a breaking change. It feels a bit cleaner, but maybe that's just me.

@eapolinario eapolinario added flytekit FlyteKit Python related issue and removed untriaged This issues has not yet been looked at by the Maintainers labels Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flytekit FlyteKit Python related issue
Projects
Status: Backlog
Development

No branches or pull requests

3 participants