New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DeviceMesh] Add support for group: Tuple[ProcessGroup, ...]
in from_group()
#125358
Comments
I am not clear how to recover the For example, if we have Now, if we have say Now, how does this generalize to the user passing in |
@awgu I thought a bit about this, the recovery math can be quite complicated in N-D scenarios, even for 2-D/3-D it seems non-trival amount of code. I'm wondering if we should do the way we do for things like The mesh tensor dim values can be easily derived similar to this https://github.com/pytorch/pytorch/blob/main/torch/distributed/device_mesh.py#L289 |
We recently added a
DeviceMesh.from_group()
API to support constructing aDeviceMesh
from an existingProcessGroup
to help interoperate with training code that usesProcessGroup
for some parallelisms andDeviceMesh
for others.pytorch/torch/distributed/device_mesh.py
Lines 432 to 433 in 9043cca
We want to expand the API to support HSDP.
group
argument to supportUnion[ProcessGroup, Tuple[ProcessGroup, ...]]
so that the user can pass in a tuple of the inter-node and intra-node PGsmesh_dim_names: Optional[Tuple[str, ...]] = None
kwarg so that the user can still give named mesh dimsThe text was updated successfully, but these errors were encountered: