How to continue training with a different learning rate #6494

MightyChaos · 2021-03-12T14:52:26Z

MightyChaos
Mar 12, 2021

I want to resume training from a checkpoint, but I want to use a different learning rate, How to achieve that? I don't really care about the training states and don't mind start a fresh training as long as the weights are proprely restored.

Right now I'm using resume_from_checkpoint=ckpt_file when creating the trainer, this automatically would give the old learning rate.

I also tried remove resume_from_checkpoint=ckpt_file, and do

net_learner.load_from_checkpoint(cfg.ckpt_path, cfg=cfg)
trainer.fit(net_learner, train_data_loader, val_data_loader)

but it seems the weights are erased, and the trainer starts from random weights.

Any help will be most appreciated, thanks so much!

Answered by rubvber

Feb 27, 2022

I opened this as an issue. However (as you'll see in the discussion there), it turns out that in my case there was no problem - the .load_from_checkpoint() method works as expected. I probably just made a different mistake which caused my loss to (immediately) blow up after resuming training, which I interpreted as arising from the issue that you described of the weights being overwritten with a new initialization. I shouldn't have jumped to that conclusion so quickly as I didn't actually verify that the weights were different in my case. I just tried it again and it works fine now.

In your case, it looks like you're using the wrong syntax, which I hadn't spotted but another user did - pl…

View full answer

rubvber · 2022-02-17T09:08:51Z

rubvber
Feb 17, 2022

I have the same question. More generally it would be useful to be able to change certain model settings when resuming training, while keeping all other settings the same, or at least be able as you said to restore the model weights and start a new training session with them.

0 replies

rubvber · 2022-02-27T10:25:55Z

rubvber
Feb 27, 2022

I opened this as an issue. However (as you'll see in the discussion there), it turns out that in my case there was no problem - the .load_from_checkpoint() method works as expected. I probably just made a different mistake which caused my loss to (immediately) blow up after resuming training, which I interpreted as arising from the issue that you described of the weights being overwritten with a new initialization. I shouldn't have jumped to that conclusion so quickly as I didn't actually verify that the weights were different in my case. I just tried it again and it works fine now.

In your case, it looks like you're using the wrong syntax, which I hadn't spotted but another user did - please refer to the link to see how it should be used. This should solve the problem for you.

1 reply

awaelchli Feb 27, 2022
Maintainer

Apologies to @MightyChaos that we never responded to the question and thanks to @rubvber for bringing it up again. Once more, to give it visibility in case someone finds this issue:

load_from_checkpoint returns a new instance. This, as posted by OP, can't work:

net_learner.load_from_checkpoint(cfg.ckpt_path, cfg=cfg)
trainer.fit(net_learner, train_data_loader, val_data_loader)

It has to be

net_learner = NetLearner.load_from_checkpoint(cfg.ckpt_path, cfg=cfg)
trainer.fit(net_learner, train_data_loader, val_data_loader)

where NetLearner is the class. This is important because an important design of load_from_checkpoint is that it can instantiate your model directly from the checkpoint.

etienne87 · 2024-05-02T16:49:51Z

etienne87
May 2, 2024

what if you do care about the optimizer states and continue the training while changing the learning rate?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to continue training with a different learning rate #6494

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to continue training with a different learning rate #6494

MightyChaos Mar 12, 2021

Replies: 3 comments · 1 reply

rubvber Feb 17, 2022

rubvber Feb 27, 2022

awaelchli Feb 27, 2022 Maintainer

etienne87 May 2, 2024

MightyChaos
Mar 12, 2021

Replies: 3 comments 1 reply

rubvber
Feb 17, 2022

rubvber
Feb 27, 2022

awaelchli Feb 27, 2022
Maintainer

etienne87
May 2, 2024