-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix azure logging #6860
base: master
Are you sure you want to change the base?
Fix azure logging #6860
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You said:
However, in Azure, if a file is created by opening it in write mode, it will not permit append operations later on.
Is there a case where Azure will permit operations later on? Is there some other mode we should open it in instead?
@@ -19,7 +19,7 @@ | |||
TensorBoard. This allows running TensorBoard without depending on | |||
TensorFlow for file operations. | |||
""" | |||
|
|||
from adlfs import AzureBlobFileSystem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have adlfs libraries installed on development or ci machines so this is not going to build. You can see this in the failures for the build checks:
https://github.com/tensorflow/tensorboard/actions/runs/9352527415/job/25755552731?pr=6860
What would be a good way to check for this without importing AzureBlobFileSystem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, if a file is initially created by opening it in append mode, this file will allow append operations later on. However, as shown in the code snippet above, the file is always created by opening it in write mode:
# write the first chunk to truncate file if it already exists
self.fs.write(self.filename, file_content, self.binary_mode)
One other option, besides checking what filesystem we are dealing with, is to always open the file in append mode initially, when the fs supports that. However, this means the contents of the file, if it exists, will not be cleared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option, which I am not very fond of, is checking the fs_class's string representation for the substring "Azure". You can see this approach in the latest commit.
However, I believe the approach I suggested above would be better. I don't have all the context behind why you would to want clear the file though, and whether always creating the file in append mode would be appropriate. In my context, where I use lightning's TensorboardLogger, a unique experiment directory gets created at the beginning of the run, so the events file will always be in a unique location, unless we explicitly want to continue an experiment, in which case appending to an existing events file should not be a problem.
@@ -671,7 +671,10 @@ def __init__(self, filename, mode): | |||
) | |||
self.filename = compat.as_bytes(filename) | |||
self.fs = get_filesystem(self.filename) | |||
self.fs_supports_append = hasattr(self.fs, "append") | |||
if _get_fs_class(self.filename) == AzureBlobFileSystem: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit odd that we would call get_filesystem, which calls fsspec.get_filesystem_class transitively, and then we call fsspec.get_filesystem_class again.
Can we remove the duplication?
Perhaps get_filesystem could return a tuple, where first entry is the filesystem (like _FSSPEC_FILESYSTEM) and the second entry is the filesystem class (if any, like AzureBlobFileSystem).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to avoid that approach because it required a bit more refactoring, but I implemented it now, please check it out.
Hi is there any update? 🙏 |
Motivation for features / changes
When using the TensorboardLogger in Pytorch lightning to log to an Azure blob storage location, the following error is thrown:
This is due to the assumption that if the filesystem supports append opperations, these will automatically be available for any file in the blob storage, see this code snippet from gfile.py:
However, in Azure, if a file is created by opening it in write mode, it will not permit append operations later on.
Technical description of changes
Screenshots of UI changes (or N/A)
Detailed steps to verify changes work correctly (as executed by you)
Alternate designs / implementations considered (or N/A)