-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explanation for achieving better performance than original paper #2
Comments
Thanks! That's a good question, and honestly I am not sure I completely know why! From the original MobileNets paper, it seems they use hyperparameters from the Inception papers, whereas I tried with the recent Nasnet paper hyperparameter (much large learning rate of ~0.2), which seem to give much better accuracy. I got the same good results with the MobileNets v2 models (and here, the reported numbers in the papers are pretty close as well). |
Wow, that's a big discovery! This is a strong evidence that how important hyperparameters are in DL :-D Thanks for your explanation! |
Hi @balancap, I'd like to run your souce codes for training MobileNet-v1/v2. I guess the training command should be |
Hi @balancap, could you please share more details of hyperparameters? Thanks a lot! |
@balancap Do you mean that you were training MobileNet v1 using learning rate ~0.2 which achieved a better accuracy than original paper? My learning rate was set to ~0.05, I tried 6000 to 10000 steps and only got 68% top-1 accuracy. Is it a problem of the small learning rate? |
Thanks for sharing your great work!
The MobileNet-v1 you trained achieved 72.9 top-1 acc. which surpasses the reported number (70.6) in original paper by a large margin. Could you please explain the reasons? Thanks!
The text was updated successfully, but these errors were encountered: