How to continue training with a different learning rate #6494
-
I want to resume training from a checkpoint, but I want to use a different learning rate, How to achieve that? I don't really care about the training states and don't mind start a fresh training as long as the weights are proprely restored. Right now I'm using I also tried remove
but it seems the weights are erased, and the trainer starts from random weights. Any help will be most appreciated, thanks so much! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
I have the same question. More generally it would be useful to be able to change certain model settings when resuming training, while keeping all other settings the same, or at least be able as you said to restore the model weights and start a new training session with them. |
Beta Was this translation helpful? Give feedback.
-
I opened this as an issue. However (as you'll see in the discussion there), it turns out that in my case there was no problem - the In your case, it looks like you're using the wrong syntax, which I hadn't spotted but another user did - please refer to the link to see how it should be used. This should solve the problem for you. |
Beta Was this translation helpful? Give feedback.
-
what if you do care about the optimizer states and continue the training while changing the learning rate? |
Beta Was this translation helpful? Give feedback.
I opened this as an issue. However (as you'll see in the discussion there), it turns out that in my case there was no problem - the
.load_from_checkpoint()
method works as expected. I probably just made a different mistake which caused my loss to (immediately) blow up after resuming training, which I interpreted as arising from the issue that you described of the weights being overwritten with a new initialization. I shouldn't have jumped to that conclusion so quickly as I didn't actually verify that the weights were different in my case. I just tried it again and it works fine now.In your case, it looks like you're using the wrong syntax, which I hadn't spotted but another user did - pl…