Cannot run it on windows #14

frankl1 · 2024-02-14T10:50:06Z

Hi,

I was trying to give try to this implementation after reading the paper. I installed all the dependencies in a Conda env on a Window PC. However, I am having the following error when I run the experiment:

$ python experiment.py -d tic-tac-toe -bs 32 -s 1@16 -e401 -lrde 200 -lr 0.002 -ki 0 -wd 0.0001 --print_rule -i 0
C:\Users\m00827298\AppData\Local\miniconda3\envs\rrl\Lib\site-packages\torch\distributed\distributed_c10d.py:608: UserWarning: Attempted 
to get default timeout for nccl backend, but NCCL support is not compiled
  warnings.warn("Attempted to get default timeout for nccl backend, but NCCL support is not compiled")
[W socket.cpp:697] [c10d] The client socket has failed to connect to [A2207000547.china.huawei.com]:47339 (system error: 10049 - The requested address is not valid in its context.).
Traceback (most recent call last):
  File "C:\Users\m00827298\Codes\RRL\experiment.py", line 174, in <module>
    train_main(rrl_args)
  File "C:\Users\m00827298\Codes\RRL\experiment.py", line 167, in train_main
    mp.spawn(train_model, nprocs=args.gpus, args=(args,))
  File "C:\Users\m00827298\AppData\Local\miniconda3\envs\rrl\Lib\site-packages\torch\multiprocessing\spawn.py", line 241, in spawn       
    return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\m00827298\AppData\Local\miniconda3\envs\rrl\Lib\site-packages\torch\multiprocessing\spawn.py", line 197, in start_processes
    while not context.join():
              ^^^^^^^^^^^^^^
  File "C:\Users\m00827298\AppData\Local\miniconda3\envs\rrl\Lib\site-packages\torch\multiprocessing\spawn.py", line 158, in join        
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "C:\Users\m00827298\AppData\Local\miniconda3\envs\rrl\Lib\site-packages\torch\multiprocessing\spawn.py", line 68, in _wrap        
    fn(i, *args)
  File "C:\Users\m00827298\Codes\RRL\experiment.py", line 57, in train_model
    dist.init_process_group(backend='nccl', init_method='env://', world_size=args.world_size, rank=rank)
  File "C:\Users\m00827298\AppData\Local\miniconda3\envs\rrl\Lib\site-packages\torch\distributed\c10d_logger.py", line 86, in wrapper    
    func_return = func(*args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\m00827298\AppData\Local\miniconda3\envs\rrl\Lib\site-packages\torch\distributed\distributed_c10d.py", line 1177, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\m00827298\AppData\Local\miniconda3\envs\rrl\Lib\site-packages\torch\distributed\rendezvous.py", line 246, in _env_rendezvous_handler
    store = _create_c10d_store(master_addr, master_port, rank, world_size, timeout, use_libuv)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\m00827298\AppData\Local\miniconda3\envs\rrl\Lib\site-packages\torch\distributed\rendezvous.py", line 174, in _create_c10d_store
    return TCPStore(
           ^^^^^^^^^
torch.distributed.DistNetworkError: Unknown error

The text was updated successfully, but these errors were encountered:

12wang3 · 2024-02-19T02:43:51Z

I am not very familiar with running PyTorch in a Windows environment. Based on the error message "Attempted to get default timeout for nccl backend, but NCCL support is not compiled", I suspect the reason might be that NCCL support is not compiled into your PyTorch installation.

frankl1 · 2024-02-21T09:03:58Z

NCCL seems to be related to NVidia GPU and I don't NVidia on my PC so I guess this is the reason I have this warning. Isn't it possible to run the code using only the CPU?

12wang3 · 2024-03-17T11:18:12Z

At present, CPU is not supported. I will add a CPU version in the future. However, it is still recommended to run on a GPU, otherwise the speed may be slow.

wanmaxiaobai · 2024-04-29T04:40:53Z

"I would like to ask if your issue has been resolved?"

frankl1 · 2024-04-29T06:05:34Z

Thanks for asking. I will give it another try when I get a GPU

12wang3 added the enhancement New feature or request label Mar 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot run it on windows #14

Cannot run it on windows #14

frankl1 commented Feb 14, 2024

12wang3 commented Feb 19, 2024

frankl1 commented Feb 21, 2024

12wang3 commented Mar 17, 2024

wanmaxiaobai commented Apr 29, 2024

frankl1 commented Apr 29, 2024

Cannot run it on windows #14

Cannot run it on windows #14

Comments

frankl1 commented Feb 14, 2024

12wang3 commented Feb 19, 2024

frankl1 commented Feb 21, 2024

12wang3 commented Mar 17, 2024

wanmaxiaobai commented Apr 29, 2024

frankl1 commented Apr 29, 2024