Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][experimental] Add multi-GPU CI tests for accelerated DAG #45259

Merged
merged 67 commits into from May 15, 2024

Conversation

stephanie-wang
Copy link
Contributor

@stephanie-wang stephanie-wang commented May 11, 2024

Why are these changes needed?

Add NCCL-based tests to CI.

Also adds a fix for a test in test_torch_tensor_dag, where the driver was being assigned a GPU as its default device. The driver should instead use CPU as its default device.

stephanie-wang and others added 30 commits April 17, 2024 15:56
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
GPU
Signed-off-by: Your Name <[email protected]>
Signed-off-by: Your Name <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Your Name <[email protected]>
Signed-off-by: Your Name <[email protected]>
Signed-off-by: Your Name <[email protected]>
Signed-off-by: Your Name <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Your Name <[email protected]>
Signed-off-by: Your Name <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Your Name <[email protected]>
Signed-off-by: Your Name <[email protected]>
Signed-off-by: Your Name <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
@stephanie-wang stephanie-wang requested a review from a team as a code owner May 13, 2024 20:52
--parallelism-per-worker 2 --gpus 2
--test-env=CUDA_VISIBLE_DEVICES=0,1
--build-name coregpubuild
--only-tags multi_gpu
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add || true, since the test fails it won't reach that sleep statement

Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Copy link
Collaborator

@can-anyscale can-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephanie-wang stephanie-wang merged commit b0d1953 into ray-project:master May 15, 2024
5 of 6 checks passed
@stephanie-wang stephanie-wang deleted the dag-nccl branch May 15, 2024 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants