Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scaleFit instead of scaleFill #58

Open
pannaf opened this issue Jul 20, 2019 · 1 comment
Open

scaleFit instead of scaleFill #58

pannaf opened this issue Jul 20, 2019 · 1 comment

Comments

@pannaf
Copy link

pannaf commented Jul 20, 2019

Thank you for the great codebase! I was able to get a PyTorch-trained version of Yolov3 (with spatial pyramid pooling) to run on an ipad pro at 26fps thanks to a lot of tips from purchasing/reading your ebook. Many, many thanks for that!

I don't have an iOS/swift development background, and I wonder if you could point me in the right direction for modifying something with the code. My original model was trained with scaleFit, where the long side is set to 416 pixels, and the short side is padded. I saw in your book the recommendation to change the image scaling to match training. To that end, I found in ViewController.swift this relevant block:

// NOTE: If you choose another crop/scale option, then you must also
// change how the BoundingBox objects get scaled when they are drawn.
// Currently they assume the full input image is used.
request.imageCropAndScaleOption = .scaleFill

I want to change this to .scaleFit, but I don't understand how to make the change for the BoundingBox scaling when drawn, as suggested by the comment you included. Could you possibly point me in the right direction?

@hollance
Copy link
Owner

I think Vision pads the bottom / right side of the image when using scaleFit. So you'll need to figure out by how much the image gets padded. If the input is 1280x720, for example, then resized it will be 416x234 with 416-234=182 pixels of padding added at the bottom to make it 416x416.

The bounding box predictions will be in that 416x416 coordinate space, and only use the 416x234 region at the top. It's probably easiest if you stretch the bounding box to fit the entire 416x416 region and then use the same code as in the example app to convert back to screen coordinates.

In other words, multiply the y-coordinate and the height of the bounding box by 416/234.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants