Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match against all 8 PNG file signature bytes #190

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sudhir-b
Copy link

@sudhir-b sudhir-b commented Sep 24, 2024

Update the PNG matcher to match all 8 bytes of the PNG signature as in:
https://www.w3.org/TR/png/#5PNG-file-signature

I'm not sure that this makes any functional difference but spotted a discrepancy and thought I'd open this PR for completeness.

As an aside, it looks like the APNG matcher matches against an identical bytearray to begin with - is there a reason for this difference in the two matchers?

@CatKasha
Copy link
Contributor

CatKasha commented Oct 2, 2024

APNG matcher was made by me and i forgot(?) to edit PNG matcher
Feel free to edit APNG matcher to use if instead of bytearray

@sudhir-b
Copy link
Author

sudhir-b commented Oct 3, 2024

Done, thanks for looking - is anything else I need to do?

@CatKasha
Copy link
Contributor

CatKasha commented Oct 4, 2024

Looks good to me

@@ -95,8 +95,14 @@ def __init__(self):

def match(self, buf):
if (len(buf) > 8 and
buf[:8] == bytearray([0x89, 0x50, 0x4e, 0x47,
0x0d, 0x0a, 0x1a, 0x0a])):
buf[0] == 0x89 and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startswith() works quickest ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, indeed (Python 3.12.5, w10, but idk how all this test will be perform on older versions of python)

python -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf[:8] == bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a])"
1000000 loops, best of 5: 232 nsec per loop

python -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf[0] == 0x89 and buf[1] == 0x50 and buf[2] == 0x4E and buf[3] == 0x47 and buf[4] == 0x0D and buf[5] == 0x0A and buf[6] == 0x1A and buf[7] == 0x0A"
1000000 loops, best of 5: 208 nsec per loop

python -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf.startswith(b'\x89PNG\r\n\x1a\n')"
5000000 loops, best of 5: 93.2 nsec per loop

and some "what if" testing

python -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf.startswith(bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a]))"
1000000 loops, best of 5: 251 nsec per loop

python -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf[:8] == b'\x89PNG\r\n\x1a\n'"
5000000 loops, best of 5: 44.5 nsec per loop

python -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf[0:1] == b'\x89' and buf[1:2] == b'P' and buf[2:3] == b'N' and buf[3:4] == b'G' and buf[4:5] == b'\r' and buf[5:6] == b'\n' and buf[6:7] == b'\x1a' and buf[7:8] == b'\n'"
1000000 loops, best of 5: 398 nsec per loop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a' insead bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with what?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ python3 -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf.startswith(bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a]))"
2000000 loops, best of 5: 173 nsec per loop
$ python3 -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf.startswith(b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a')"
5000000 loops, best of 5: 60 nsec per loop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, the speedup cause is not in comparizon function, but in data transformation:

$ python3 -m timeit -s "s=0; buf = b'\x89PNG\r\n\x1a\n' + b'x'*100000" "for i in range(100000): s+=1 if buf[0] == b'\x89' and buf[1] == b'P' and buf[2] == b'N' and buf[3] == b'G' and buf[4] == b'\r' and buf[5] == b'\n' and buf[6] == b'\x1a' and buf[7] == b'\n' else 0"
100 loops, best of 5: 3.65 msec per loop

$ python3 -m timeit -s "s=0; buf = b'\x89PNG\r\n\x1a\n' + b'x'*100000" "for i in range(100000): s+=1 if buf[0:1] == b'\x89' and buf[1:2] == b'P' and buf[2:3] == b'N' and buf[3:4] == b'G' and buf[4:5] == b'\r' and buf[5:6] == b'\n' and buf[6:7] == b'\x1a' and buf[7:8] == b'\n' else 0"
10 loops, best of 5: 28.5 msec per loop

$ python3 -m timeit -s "s=0; buf = b'\x89PNG\r\n\x1a\n' + b'x'*100000" "for i in range(100000): s+=1 if buf[0] == 0x89 and buf[1] == 0x50 and buf[2] == 0x4e and buf[3] == 0x47 and buf[4] == 0x0d and buf[5] == 0x0a and buf[6] == 0x1a and buf[7] == 0x0a else 0"
20 loops, best of 5: 13.7 msec per loop

b'\x89' is compared quickest than 0x89 (integer). Cannot say about old versions of python.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was curious how "buf.startswith(bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a]))" will be perform, thats why i test it

here is some more test, first half bytes match

python -m timeit -s "buf = b'\x89PNGFOOBARFOOBAR'" "buf[:8] == bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a])"
1000000 loops, best of 5: 242 nsec per loop

python -m timeit -s "buf = b'\x89PNGFOOBARFOOBAR'" "buf[0] == 0x89 and buf[1] == 0x50 and buf[2] == 0x4E and buf[3] == 0x47 and buf[4] == 0x0D and buf[5] == 0x0A and buf[6] == 0x1A and buf[7] == 0x0A"
2000000 loops, best of 5: 138 nsec per loop

python -m timeit -s "buf = b'\x89PNGFOOBARFOOBAR'" "buf.startswith(b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a')"
5000000 loops, best of 5: 94.8 nsec per loop

python -m timeit -s "buf = b'\x89PNGFOOBARFOOBAR'" "buf[:8] == b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a'"
5000000 loops, best of 5: 73.9 nsec per loop

zero bytes match

python -m timeit -s "buf = b'FOOBARFOOBAR'" "buf[:8] == bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a])"
1000000 loops, best of 5: 244 nsec per loop

python -m timeit -s "buf = b'FOOBARFOOBAR'" "buf[0] == 0x89 and buf[1] == 0x50 and buf[2] == 0x4E and buf[3] == 0x47 and buf[4] == 0x0D and buf[5] == 0x0A and buf[6] == 0x1A and buf[7] == 0x0A"
10000000 loops, best of 5: 33.5 nsec per loop

python -m timeit -s "buf = b'FOOBARFOOBAR'" "buf.startswith(b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a')"
5000000 loops, best of 5: 95.3 nsec per loop

python -m timeit -s "buf = b'FOOBARFOOBAR'" "buf[:8] == b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a'"
5000000 loops, best of 5: 66.5 nsec per loop

buf[:8] == b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a' faster than buf.startswith(b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a')
and is fastest only if first byte do/not match, then times will be worse than this two matchers (at least on my pc)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops dint saw your message when posting this one
buf[0] == b'\x89' and buf[1] == b'P' and ... will not work because of different types

>>> buf = b'\x89PNG\r\n\x1a\n'
>>> buf[0] == b'\x89'
False
>>> type(buf[0])
<class 'int'>
>>> type(buf[0:1]) 
<class 'bytes'>
>>> type(0x89)
<class 'int'>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants