Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply run length encoding while saving masks. #291

Open
jacksonjacobs1 opened this issue Apr 29, 2024 · 6 comments · May be fixed by #294
Open

Apply run length encoding while saving masks. #291

jacksonjacobs1 opened this issue Apr 29, 2024 · 6 comments · May be fixed by #294
Assignees

Comments

@jacksonjacobs1
Copy link
Collaborator

# With mask a uint8 array with value in {0,255}.
Image.fromarray(mask).convert("1").save(fpath,bits=1,optimize=True)
@CielAl
Copy link
Contributor

CielAl commented May 2, 2024

That's actually redundant (changing to PILLOW) IMO.
skimage.io essentially uses image.io unless you specifiy other plugins (e.g., matplotlib).
Now assume you have a ubyte binary image,
skimage.io.imsave(mask_ubyte, bits=1) will by itself deliver the RLE compressed png, meaning you only need to specify the bits for the PNG.

Also, a uniformly random binary PNG mask (2048x2048) saved with default compress level (9) but without RLE is roughly 800KB, and 500KB with RLE. In case of non-random binary mask with large consecutive white regions, the size difference will be much smaller.

So my two cents is that, if you want to, you can introduce RLE by adding signatures to existing skimage.io/imageio function rather than using PILLOW.

@nanli-emory
Copy link
Collaborator

That's actually redundant (changing to PILLOW) IMO. skimage.io essentially uses image.io unless you specifiy other plugins (e.g., matplotlib). Now assume you have a ubyte binary image, skimage.io.imsave(mask_ubyte, bits=1) will by itself deliver the RLE compressed png, meaning you only need to specify the bits for the PNG.

Also, a uniformly random binary PNG mask (2048x2048) saved with default compress level (9) but without RLE is roughly 800KB, and 500KB with RLE. In case of non-random binary mask with large consecutive white regions, the size difference will be much smaller.

So my two cents is that, if you want to, you can introduce RLE by adding signatures to existing skimage.io/imageio function rather than using PILLOW.

Hi @CielAl and @jacksonjacobs1 . I tested both methods. The Image.fromarray(mask).convert("1").save(fpath,bits=1,optimize=True) has smaller size of png. The image with _rle use @jacksonjacobs1's method and _sk_rle use @CielAl's method.

Image

@CielAl
Copy link
Contributor

CielAl commented May 3, 2024

Hi @nanli-emory and @jacksonjacobs1

That's probably the difference of signatures: both should specify "optimize". Note that skimage.io by default calls imageio's methods, and imageio by default call's PILLOW's writer itself (see imageio.core.imopen and imageio.core.v3_plugin.

Hence, since we already utilize the skimage.io interface in all modules, which in fact uses the PILLOW's writer, it won't make sense to change the interfaces when skimage.io can achieves the same thing.

Here are my test cases and the difference is neglectable:
I have two sets of example belows, both specified compress_level=9 for PIL and imageio/skimage methods.
One is a random binary mask (uniform) and the other is a intermediate result from HistoQC. The results seem to be quite close (in the actual tissue mask example the PILLOW method is slightly smaller and in the random case the skimage/imageio method is slightly smaller).

image
image

@jacksonjacobs1
Copy link
Collaborator Author

Agree with @CielAl that we should keep the skimage interface.

If adding RLE helps and is not significantly more CPU intensive, I think it's an easy win?

@CielAl
Copy link
Contributor

CielAl commented May 6, 2024

Agree with @CielAl that we should keep the skimage interface.

If adding RLE helps and is not significantly more CPU intensive, I think it's an easy win?

Indeed an easy win, especially if you intend to make HistoQC a feasible option for 10k slides, since RLE significantly reduce the storage overhead for binary masks (perhaps 30%+).

We can simply update the current io.imsave(png_fname, img_as_ubyte(some_binary_mask)) lines to ``````io.imsave(png_fname, img_as_ubyte(some_binary_mask), bits=1, optimize=True, compress_level=[intended compression level])``` (default compress_level for PNG is 9 and it should be sufficient).

@nanli-emory
Copy link
Collaborator

Hi @jacksonjacobs1 and @CielAl, please review the PR. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants