-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix smoke tests bug #4492
fix smoke tests bug #4492
Conversation
/quicktest-core |
/quicktest-core |
1 similar comment
/quicktest-core |
/quicktest-core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zpoint! Please see the comments below
/quicktest-core |
/quicktest-core |
sky/data/mounting_utils.py
Outdated
# The line below ensures the cache directory is cleared before mounting to | ||
# avoid "config error in file_cache [temp directory not empty]" error, which | ||
# can occur after stopping and starting the same cluster on Azure. | ||
# This helps ensure a clean state for blobfuse2 operations. | ||
mount_cmd = (f'rm -rf {cache_path} && mkdir -p {cache_path} && ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this cause sky launch
again on an existing UP cluster to clean up the cache? Could this cause issue if there is a running process using the cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about generating a new cache_path
on every sky launch
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comments offline : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now use the boot_time to generate the cache directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last login: Thu Jan 9 07:08:36 2025 from 163.125.113.120
(base) azureuser@t-azure-storage-mount-b8-09-94a6-2815-0:~$ ls ~/.sky/blobfuse2_cache/
azureopendatastorage_nyctlc_1736406391 sky635694a6141885a3_sky-test-1736406192_1736406391
(base) azureuser@t-azure-storage-mount-b8-09-94a6-2815-0:~$
sky/data/mounting_utils.py
Outdated
# The line below ensures the cache directory is new before mounting to | ||
# avoid "config error in file_cache [temp directory not empty]" error, which | ||
# can occur after stopping and starting the same cluster on Azure. | ||
# This helps ensure a clean state for blobfuse2 operations. | ||
cache_path += f'_{int(time.time())}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this cause the same issue for launch
on existing UP
cluster where we create a new folder everytime?
sky/data/mounting_utils.py
Outdated
# The line below ensures the cache directory is cleared before mounting to | ||
# avoid "config error in file_cache [temp directory not empty]" error, which | ||
# can occur after stopping and starting the same cluster on Azure. | ||
# This helps ensure a clean state for blobfuse2 operations. | ||
mount_cmd = (f'rm -rf {cache_path} && mkdir -p {cache_path} && ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comments offline : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zpoint! Mostly looks good to me know.
@Michaelvll, could we get this PR reviewed and unblock the usage of the comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zpoint! LGTM.
# The line below ensures the cache directory is new before mounting to | ||
# avoid "config error in file_cache [temp directory not empty]" error, which | ||
# can occur after stopping and starting the same cluster on Azure. | ||
# This helps ensure a clean state for blobfuse2 operations. | ||
remote_boot_time_cmd = 'date +%s -d "$(uptime -s)"' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we clean up the old cache folder that was created in a previous boot? We can add a TODO here first.
Some fix for #4438
Refer to release comment here and some remaining problem after resolved
Change regoin for:
And robust backward compatibility
Add a
gke
mark for pytest and modifygenerate_pipeline.py
. The basic idea is to separate tests marked withgke
to run on the GKE cluster, while other Kubernetes tests can run on the local cluster, without relying on GKE for buildkite.Remove the
me-central2
region test since its restricted and our account don't have permissionTested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh