Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MobileNet v2 CPU inferencing performance #52

Open
matt-ny opened this issue Feb 25, 2018 · 3 comments
Open

MobileNet v2 CPU inferencing performance #52

matt-ny opened this issue Feb 25, 2018 · 3 comments

Comments

@matt-ny
Copy link

matt-ny commented Feb 25, 2018

Comparing mobilenet v1 and v2 for inferencing on the CPU, I have observed some surprising numbers:

  1. For v1, my inference time was about 148ms on average. For v2, the average was 185ms (25% slower)

  2. The max_rss memory usage of the process reported about 160MB increase in memory for each copy of the MobileNet v1 loaded in Caffe, after initializing the Net and running 1 forward pass. For v2, the increase was about 300MB per copy.

I am using BVLC Caffe with Intel MKL, doing the measurements on the same system ( Intel Xeon CPU E5-2658 v2 @ 2.40GHz ) contemporaneously, and discarding the first few timings of each to "warm up" any caching.

From the paper I expected inference time and mem usage to be less.... am I missing something?

@yangluoluo
Copy link

me too, I think it is not the official version

@yangluoluo
Copy link

conv group need to be optimized

@matt-ny
Copy link
Author

matt-ny commented Mar 2, 2018

conv group need to be optimized

I am comparing v1 and v2, both from this repo, and see the performance of v2 in terms of speed and memory usage is worse for my CPU. So the extent to which convolutions are optimized by Caffe is constant across the comparison. I do see total MACC ops 573M vs 438M for v1 vs v2 so v2 is doing fewer convo ops.

Perhaps the size of certain blobs is causing many CPU cache misses? This processor has a 25MB cache size.

https://dgschwend.github.io/netscope/#/editor reports total activation (in number of floats, not bytes) of about 35M for MN v2 and 20M for v1.

For mobilenet v2 the largest single layer activation was 3.61M:

17 | conv2_2 | submodule(2) |   | 16 | 112x112 | 96 | 112x112 | macc: 20.47M | activation: 3.61M

vs in v1, the largest single layer activation was 2.41M .

@shicai any thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants