Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tests] test_docker_storage_mounts failing on master #4473

Open
romilbhardwaj opened this issue Dec 16, 2024 · 2 comments
Open

[tests] test_docker_storage_mounts failing on master #4473

romilbhardwaj opened this issue Dec 16, 2024 · 2 comments

Comments

@romilbhardwaj
Copy link
Collaborator

test_docker_storage_mounts is failing on master:

pytest tests/test_smoke.py::test_docker_storage_mounts --lf --kubernetes
___________________ test_docker_storage_mounts[docker:ubuntu:18.04] ____________________
[gw2] darwin -- Python 3.9.13 /Users/romilb/tools/anaconda3/bin/python
tests/smoke_tests/test_mount_and_storage.py:373: in test_docker_storage_mounts
    smoke_tests_utils.run_one_test(test)
tests/smoke_tests/smoke_tests_utils.py:339: in run_one_test
    raise Exception(f'test failed: less {log_file.name}')
E   Exception: test failed: less /var/folders/98/hhq8wrtx6y13196q61xphjsm0000gn/T/docker_storage_mounts-2vr083c5.log
_______ test_docker_storage_mounts[docker:nvidia/cuda:11.8.0-devel-ubuntu18.04] ________
[gw0] darwin -- Python 3.9.13 /Users/romilb/tools/anaconda3/bin/python
tests/smoke_tests/test_mount_and_storage.py:373: in test_docker_storage_mounts
    smoke_tests_utils.run_one_test(test)
tests/smoke_tests/smoke_tests_utils.py:339: in run_one_test
    raise Exception(f'test failed: less {log_file.name}')
E   Exception: test failed: less /var/folders/98/hhq8wrtx6y13196q61xphjsm0000gn/T/docker_storage_mounts-1x5j47hr.log
=============================== short test summary info ================================
FAILED tests/test_smoke.py::test_docker_storage_mounts[docker:ubuntu:18.04] - Excepti...
FAILED tests/test_smoke.py::test_docker_storage_mounts[docker:nvidia/cuda:11.8.0-devel-ubuntu18.04]
2 failed, 2 passed, 3008 warnings in 199.35s (0:03:19)

From provision.log:

sky.exceptions.CommandError: Command mkdir -p ~/.sky/file_mounts/mount_private_copy && aws --version >/dev/null 2>&1 || pip3 install awscli && aws s3 sync --no-follow-symlinks s3://sky-test-1734390492769753 ~/.sky/file_mounts/mount_private_copy failed with return code 127.
�[31mFailed to run command before rsync s3://sky-test-1734390492769753 -> /mount_private_copy. Ensure that the network is stable, then retry. mkdir -p ~/.sky/file_mounts/mount_private_copy && aws --version >/dev/null 2>&1 || pip3 install awscli && aws s3 sync --no-follow-symlinks s3://sky-test-1734390492769753 ~/.sky/file_mounts/mount_private_copy See logs in ~/sky_logs/sky-2024-12-16-15-08-19-961008/file_mounts.log�[0m
D 12-16 15:11:14 skypilot_config.py:228] Using config path: /Users/romilb/.sky/config.yaml

This is because pip is not installed on these base images. This looks like a recent regression?

Version & Commit info:

  • sky -v: PLEASE_FILL_IN
  • sky -c: PLEASE_FILL_IN
@romilbhardwaj
Copy link
Collaborator Author

Note, doesn't happen on docker containers running in cloud (E.g., aws). We install pip3 at some point when provisioning containers on the cloud, but not on k8s.

@Michaelvll
Copy link
Collaborator

Note: Changing pip3 -> {constants.SKY_UV_PIP_CMD} and replace the awscli usage to {constants.SKY_REMOTE_PYTHON_ENV}/bin/aws underthehood in sky/cloud_stores.py seems making it work again.

@Michaelvll Michaelvll added the OSS label Dec 19, 2024 — with Linear
@Michaelvll Michaelvll removed the OSS label Dec 19, 2024
@Michaelvll Michaelvll added the OSS label Dec 19, 2024 — with Linear
@Michaelvll Michaelvll removed the OSS label Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants