-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
program frozen #56
Comments
after i keyboard interrupt it, it gives following message: KeyboardInterrupt |
Is the GPU active when you're running the training? Or is that stalling too? |
The GPU is stalling too... It's a Tesla V100, the CUDA version is 11.0 and Driver version 450.80.02 |
I just wanted to follow up to see if you worked anything out. I don't know if I have any ideas for what could be causing this with our code, but maybe you found a problem? |
Hi Carlini Really thankful that you followed up on the issue. I really have trouble figuring out the issue which prevents me from using the code. I first suspect it is some environment issue, here is the environment i'm using: This file may be used to create an environment using:$ conda create --name --fileplatform: linux-64_libgcc_mutex=0.1=main |
Huh. Two ideas maybe:
|
I followed your suggestion, which provide me with some useful insights. So
So I guess
For my concern now, I would be very happy if i can get fixmatch to work on my p100 or v100. I really don't understand what could have happened that make mixmatch able to run but fixmatch not able to run? Do you have any thoughts? |
That's very interesting. This fixmatch codebase has an implementation of mixmatch. Does that also run properly? If that works, then maybe try running fixmatch with something like --uratio=1 and see if that helps. Maybe it's the batch size that's the problem? |
Hi Carlini That's very weird, i cannot run the implementation of mixmatch from the fixmatch codebase (while i'm able to run the original mixmatch codebase). May i ask what GPU do you run the fixmatch codebase on? could it be the new code not compatible with some GPUs? (I have very limited knowledge about this, hope that it is not a ridiculous guess...) |
That is very strange. We've never seen any issues with different GPUs in the past, and the two codebases are very similar. Maybe @david-berthelot has some insight that I'm missing. |
In your requirment.txt, there's no specific requirement for python version or cudatoolkit, cudnn, cuda version. Do I need to install any specific version of cudatoolkit, cudnn or cuda? |
Dear authors
Your work is very exciting and I want to try out your code. I followed the instructions on readme, and is trying to run this example
CUDA_VISIBLE_DEVICES=0 python fixmatch.py --filters=32 --dataset=cifar10.3@40-1 --train_dir ./experiments/fixmatch
however my program get stucked at self.train_step forever...
I did installed the required environments as you pointed out in the readme.
Do you have any idea what's going on?
I
The text was updated successfully, but these errors were encountered: