Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write image diff to disk even if test passed #234

Open
nedtwigg opened this issue Aug 21, 2020 · 8 comments
Open

Write image diff to disk even if test passed #234

nedtwigg opened this issue Aug 21, 2020 · 8 comments

Comments

@nedtwigg
Copy link

I have been forced to set my image diff threshold pretty high:

  customDiffConfig: {
    threshold: 0.3,
  },
  failureThreshold: 0.1,
  failureThresholdType: "percent",

The thing I'm snapshotting is text rendered by puppeteer. The snapshots are created on mac, but CI runs on linux. Small changes in font rendering (especially font width) add up across the width of the image. I tried ssim and its various modes, but they required me to set threshold even higher, 20-40%.

As a result of the high threshold, I'd like to have the option to manually audit CI builds by looking at the image snapshots as artifacts, even if the tests fell below the 10% threshold needed to fail. Maybe an option like dumpDiffToDiskEvenOnPass=true

@omnisip
Copy link
Contributor

omnisip commented Aug 21, 2020 via email

@l-abels
Copy link

l-abels commented Aug 21, 2020

This describes my situation exactly, including my experience with ssim. I've found myself wanting this feature a few times. I'm pretty close to breaking down and finally throwing a docker container at the problem, though.

@nedtwigg
Copy link
Author

I wonder about an image diff algorithm like this:

  • for each horizontal scanline
  • perform naive per-pixel diff
  • find and combine segments where diffs were within say 10 pixels of each other
    • so each word of text should be one contiguous line segment, from first pixel of word to last pixel of word
  • this line segment will have different lengths on the left vs the right
  • rescale both segments to length one, take their dot product, and multiply by their average length to weight into total pixel difference
    • should be close to zero for text with the same letters, and much higher for text with changed letters

Obviously that's a huge feature request, and I'll be honest that it's definitely not going to make the top of my todo list. But screenshot comparison which is robust to minor font changes but rejects content changes would be super useful :D. You need something which can drift horizontally across the page, which the algorithm above can do. It would fail on text reflow though :(

Container is definitely easier if you don't mind complexity in the dev workflow.

@omnisip
Copy link
Contributor

omnisip commented Aug 21, 2020

This describes my situation exactly, including my experience with ssim. I've found myself wanting this feature a few times. I'm pretty close to breaking down and finally throwing a docker container at the problem, though.

I think there's some general confusion about what pixelmatch and SSIM do, and why you're not achieving the desired results. Both metrics are designed to tell how different one image is from a reference image. They fundamentally treat the reference image as a pure signal (think signal to noise ratio) and calculate degradation. This degradation can happen on what seems like identical platforms (Linux Chrome vs. Linux Chrome). For instance, let's say one of the chips is a brand new AMD Ryzen, and the other is an old Intel Xeon. Because the CPU vectorized runtime selected instructions don't match, they produce imperceptibly different output. However, pixel values generated are significantly different because of the way the values were calculated. An alternative case, still on the same operating system, is when Chrome will offload to the GPU. In these cases, SSIM is a far superior metric relative to pixelmatch because the images are no longer apples to apples because the filtering and transformation of the pixels produce different output. SSIM achieves excellent results in these cases because it is a metric derived from the mean, variance, and covariance over pixel windows (say 11x11 squares) from each pixel. As a result, SSIM can now compare two identical images produced through different transformations in terms of the pixels relationships between each other -- restoring the apples to apples comparison it should be.

Now let's compare this to the case you're describing. You're trying to determine not whether or not the two images match -- but whether the outputs are acceptable to the user. This sits somewhere between functional equivalence and a computer vision problem. In an ideal world, you'd use something like a Bayes algorithm (think SPAM analysis) to do a fuzzy match analysis. But how do you do that at scale? This requires extensive training of the algorithm to know whether or not all of the information communicated is communicated equivalently. For this particular case, you might benefit from an OCR comparison derived comparison to ensure all of the characters are extracted, and the extraction is equal -- but that's outside the scope of a pure image comparison function.

@omnisip
Copy link
Contributor

omnisip commented Aug 21, 2020

I wonder about an image diff algorithm like this:

  • for each horizontal scanline

  • perform naive per-pixel diff

  • find and combine segments where diffs were within say 10 pixels of each other

    • so each word of text should be one contiguous line segment, from first pixel of word to last pixel of word
  • this line segment will have different lengths on the left vs the right

  • rescale both segments to length one, take their dot product, and multiply by their average length to weight into total pixel difference

    • should be close to zero for text with the same letters, and much higher for text with changed letters

If I understand correctly, you're looking to edge detect subimages inside images and then compare them against what you expect to be subimages inside another image, is that right?

@nedtwigg
Copy link
Author

Exactly - to take take the very hard problem of the OCR and semantic meaning of the screenshot, and turn it into an image processing problem. SSIM doesn't need neural-net object detection to identify and ignore compression artifacts, and I don't think OCR is required to ignore changes in font spacing.

The reason SSIM works badly on minor font-spacing changes is that it assumes there's no drift, only local artifacts. It works quite well for the first few tiles of text, but past ~100px the minor change in font spacing has caused the two images to be completely uncorrelated.

If you draw a horizontal scanline, find the median RGB and define it as zero, and then count how many times the scanline crosses that zero, then that count alone will be a very good signature for the content of the text. It would be hard to add, remove, or change a letter, without changing that metric. But if you change the spacing or weight of the font, it would not affect the metric at all. The problem with the "zero-crossing" approach is that it's hard to reconcile back into "% pixels" different, which is why the weighted-dot-product approach is probably a more natural fit.

@omnisip
Copy link
Contributor

omnisip commented Aug 21, 2020

I'm open to suggestions, and I don't particularly care about the percent or pixel threshold. If it needs to be adjusted or changed for circumstances, it's not a big deal.

If you want to make a specific suggestion for how to implement this, please check out weberSsim.ts in ssim.js 3.2. it has my new implemention that can calculate any individual variance covariance or mean in constant time, and it can do that over any size square window of pixels.

If it's doable and it works, I'll implement it and post it here.

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants