-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Open FlyteFile from remote path #2991
Open
JiangJiaWei1103
wants to merge
3
commits into
flyteorg:master
Choose a base branch
from
JiangJiaWei1103:open-ff-from-remote-path
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+179
−6
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -300,8 +300,45 @@ def __init__( | |||||
self._remote_source: typing.Optional[str] = None | ||||||
|
||||||
def __fspath__(self): | ||||||
# This is where a delayed downloading of the file will happen | ||||||
""" | ||||||
Define the file path protocol for opening FlyteFile with the context manager, | ||||||
following show two common use cases: | ||||||
|
||||||
1. Directly open a FlyteFile with a local path: | ||||||
|
||||||
ff = FlyteFile(path=local_path) | ||||||
with open(ff, "r") as f: | ||||||
# Read your local file here | ||||||
# ... | ||||||
|
||||||
There's no need to handle downloading of the file because it's on the local file system. | ||||||
In this case, a dummy downloading will be done. | ||||||
|
||||||
2. Directly open a FlyteFile with a remote path: | ||||||
|
||||||
ff = FlyteFile(path=remote_path) | ||||||
with open(ff, "r") as f: | ||||||
# Read your remote file here | ||||||
# ... | ||||||
|
||||||
We now support directly opening a FlyteFile with a file from the remote data storage. | ||||||
In this case, a delayed downloading of the remote file will be done. | ||||||
For details, please refer to this issue: https://github.com/flyteorg/flyte/issues/6090. | ||||||
""" | ||||||
ctx = FlyteContextManager.current_context() | ||||||
|
||||||
if ctx.file_access.is_remote(self.path) and self._remote_source is None: | ||||||
# Setup remote file source and local file destination | ||||||
self._remote_source = self.path | ||||||
local_path = ctx.file_access.get_random_local_path(self._remote_source) | ||||||
self._downloader = lambda: FlyteFilePathTransformer.downloader( | ||||||
remote_path=self._remote_source, local_path=local_path | ||||||
) | ||||||
self.path = local_path | ||||||
|
||||||
if not self._downloaded: | ||||||
# Download data from remote to local or | ||||||
# run dummy downloading for input local path | ||||||
self._downloader() | ||||||
self._downloaded = True | ||||||
return self.path | ||||||
|
@@ -693,16 +730,25 @@ async def async_to_python_value( | |||||
|
||||||
# For the remote case, return an FlyteFile object that can download | ||||||
local_path = ctx.file_access.get_random_local_path(uri) | ||||||
|
||||||
def _downloader(): | ||||||
return ctx.file_access.get_data(uri, local_path, is_multipart=False) | ||||||
|
||||||
expected_format = FlyteFilePathTransformer.get_format(expected_python_type) | ||||||
ff = FlyteFile.__class_getitem__(expected_format)(local_path, _downloader) | ||||||
ff = FlyteFile.__class_getitem__(expected_format)( | ||||||
path=local_path, downloader=lambda: self.downloader(remote_path=uri, local_path=local_path) | ||||||
) | ||||||
ff._remote_source = uri | ||||||
|
||||||
return ff | ||||||
|
||||||
@staticmethod | ||||||
def downloader(remote_path: str, local_path: str) -> None: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we use the context that we pass to async_to_python_value?
Suggested change
|
||||||
""" | ||||||
Download data from remote_path to local_path. | ||||||
|
||||||
We design the downloader as a static method because its behavior is logically | ||||||
related to this class but don't need to interact with class or instance data. | ||||||
""" | ||||||
ctx = FlyteContextManager.current_context() | ||||||
ctx.file_access.get_data(remote_path, local_path, is_multipart=False) | ||||||
|
||||||
def guess_python_type(self, literal_type: LiteralType) -> typing.Type[FlyteFile[typing.Any]]: | ||||||
if ( | ||||||
literal_type.blob is not None | ||||||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
29 changes: 29 additions & 0 deletions
29
tests/flytekit/integration/remote/workflows/basic/flytefile.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
from flytekit import task, workflow | ||
from flytekit.types.file import FlyteFile | ||
|
||
|
||
@task | ||
def open_ff_from_remote(remote_file_path: str) -> FlyteFile: | ||
"""Open FlyteFile from a remote file path. | ||
|
||
Args: | ||
remote_file_path: Remote file path. | ||
|
||
Returns: | ||
ff: FlyteFile object. | ||
""" | ||
ff = FlyteFile(path=remote_file_path) | ||
with open(ff, "r") as f: | ||
content = f.read() | ||
print(f"FILE CONTENT | {content}") | ||
|
||
return ff | ||
|
||
|
||
@workflow | ||
def wf(remote_file_path: str) -> None: | ||
remote_ff = open_ff_from_remote(remote_file_path=remote_file_path) | ||
|
||
|
||
if __name__ == "__main__": | ||
wf() |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to move these code to
__init__
?