Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explanation for achieving better performance than original paper #2

Open
AIROBOTAI opened this issue Apr 19, 2018 · 5 comments
Open

Comments

@AIROBOTAI
Copy link

Thanks for sharing your great work!

The MobileNet-v1 you trained achieved 72.9 top-1 acc. which surpasses the reported number (70.6) in original paper by a large margin. Could you please explain the reasons? Thanks!

@balancap
Copy link
Owner

Thanks!

That's a good question, and honestly I am not sure I completely know why! From the original MobileNets paper, it seems they use hyperparameters from the Inception papers, whereas I tried with the recent Nasnet paper hyperparameter (much large learning rate of ~0.2), which seem to give much better accuracy. I got the same good results with the MobileNets v2 models (and here, the reported numbers in the papers are pretty close as well).

@AIROBOTAI
Copy link
Author

Wow, that's a big discovery! This is a strong evidence that how important hyperparameters are in DL :-D Thanks for your explanation!

@AIROBOTAI
Copy link
Author

Hi @balancap, I'd like to run your souce codes for training MobileNet-v1/v2. I guess the training command should be python tf_cnn_benchmarks.py followed by hyperparameter settings. Could you please show me the list of hyperparameters you use? Or do you just follow Nasnet? Thanks!

@AIROBOTAI
Copy link
Author

Hi @balancap, could you please share more details of hyperparameters? Thanks a lot!

@haoxi911
Copy link

@balancap Do you mean that you were training MobileNet v1 using learning rate ~0.2 which achieved a better accuracy than original paper?

My learning rate was set to ~0.05, I tried 6000 to 10000 steps and only got 68% top-1 accuracy. Is it a problem of the small learning rate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants