-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match against all 8 PNG file signature bytes #190
base: master
Are you sure you want to change the base?
Conversation
APNG matcher was made by me and i forgot(?) to edit PNG matcher |
Done, thanks for looking - is anything else I need to do? |
Looks good to me |
@@ -95,8 +95,14 @@ def __init__(self): | |||
|
|||
def match(self, buf): | |||
if (len(buf) > 8 and | |||
buf[:8] == bytearray([0x89, 0x50, 0x4e, 0x47, | |||
0x0d, 0x0a, 0x1a, 0x0a])): | |||
buf[0] == 0x89 and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
startswith() works quickest ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, indeed (Python 3.12.5, w10, but idk how all this test will be perform on older versions of python)
python -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf[:8] == bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a])"
1000000 loops, best of 5: 232 nsec per loop
python -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf[0] == 0x89 and buf[1] == 0x50 and buf[2] == 0x4E and buf[3] == 0x47 and buf[4] == 0x0D and buf[5] == 0x0A and buf[6] == 0x1A and buf[7] == 0x0A"
1000000 loops, best of 5: 208 nsec per loop
python -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf.startswith(b'\x89PNG\r\n\x1a\n')"
5000000 loops, best of 5: 93.2 nsec per loop
and some "what if" testing
python -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf.startswith(bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a]))"
1000000 loops, best of 5: 251 nsec per loop
python -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf[:8] == b'\x89PNG\r\n\x1a\n'"
5000000 loops, best of 5: 44.5 nsec per loop
python -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf[0:1] == b'\x89' and buf[1:2] == b'P' and buf[2:3] == b'N' and buf[3:4] == b'G' and buf[4:5] == b'\r' and buf[5:6] == b'\n' and buf[6:7] == b'\x1a' and buf[7:8] == b'\n'"
1000000 loops, best of 5: 398 nsec per loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a'
insead bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ python3 -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf.startswith(bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a]))"
2000000 loops, best of 5: 173 nsec per loop
$ python3 -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf.startswith(b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a')"
5000000 loops, best of 5: 60 nsec per loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, the speedup cause is not in comparizon function, but in data transformation:
$ python3 -m timeit -s "s=0; buf = b'\x89PNG\r\n\x1a\n' + b'x'*100000" "for i in range(100000): s+=1 if buf[0] == b'\x89' and buf[1] == b'P' and buf[2] == b'N' and buf[3] == b'G' and buf[4] == b'\r' and buf[5] == b'\n' and buf[6] == b'\x1a' and buf[7] == b'\n' else 0"
100 loops, best of 5: 3.65 msec per loop
$ python3 -m timeit -s "s=0; buf = b'\x89PNG\r\n\x1a\n' + b'x'*100000" "for i in range(100000): s+=1 if buf[0:1] == b'\x89' and buf[1:2] == b'P' and buf[2:3] == b'N' and buf[3:4] == b'G' and buf[4:5] == b'\r' and buf[5:6] == b'\n' and buf[6:7] == b'\x1a' and buf[7:8] == b'\n' else 0"
10 loops, best of 5: 28.5 msec per loop
$ python3 -m timeit -s "s=0; buf = b'\x89PNG\r\n\x1a\n' + b'x'*100000" "for i in range(100000): s+=1 if buf[0] == 0x89 and buf[1] == 0x50 and buf[2] == 0x4e and buf[3] == 0x47 and buf[4] == 0x0d and buf[5] == 0x0a and buf[6] == 0x1a and buf[7] == 0x0a else 0"
20 loops, best of 5: 13.7 msec per loop
b'\x89' is compared quickest than 0x89 (integer). Cannot say about old versions of python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was curious how "buf.startswith(bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a]))"
will be perform, thats why i test it
here is some more test, first half bytes match
python -m timeit -s "buf = b'\x89PNGFOOBARFOOBAR'" "buf[:8] == bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a])"
1000000 loops, best of 5: 242 nsec per loop
python -m timeit -s "buf = b'\x89PNGFOOBARFOOBAR'" "buf[0] == 0x89 and buf[1] == 0x50 and buf[2] == 0x4E and buf[3] == 0x47 and buf[4] == 0x0D and buf[5] == 0x0A and buf[6] == 0x1A and buf[7] == 0x0A"
2000000 loops, best of 5: 138 nsec per loop
python -m timeit -s "buf = b'\x89PNGFOOBARFOOBAR'" "buf.startswith(b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a')"
5000000 loops, best of 5: 94.8 nsec per loop
python -m timeit -s "buf = b'\x89PNGFOOBARFOOBAR'" "buf[:8] == b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a'"
5000000 loops, best of 5: 73.9 nsec per loop
zero bytes match
python -m timeit -s "buf = b'FOOBARFOOBAR'" "buf[:8] == bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a])"
1000000 loops, best of 5: 244 nsec per loop
python -m timeit -s "buf = b'FOOBARFOOBAR'" "buf[0] == 0x89 and buf[1] == 0x50 and buf[2] == 0x4E and buf[3] == 0x47 and buf[4] == 0x0D and buf[5] == 0x0A and buf[6] == 0x1A and buf[7] == 0x0A"
10000000 loops, best of 5: 33.5 nsec per loop
python -m timeit -s "buf = b'FOOBARFOOBAR'" "buf.startswith(b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a')"
5000000 loops, best of 5: 95.3 nsec per loop
python -m timeit -s "buf = b'FOOBARFOOBAR'" "buf[:8] == b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a'"
5000000 loops, best of 5: 66.5 nsec per loop
buf[:8] == b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a'
faster than buf.startswith(b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a')
and
is fastest only if first byte do/not match, then times will be worse than this two matchers (at least on my pc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops dint saw your message when posting this one
buf[0] == b'\x89' and buf[1] == b'P' and ...
will not work because of different types
>>> buf = b'\x89PNG\r\n\x1a\n'
>>> buf[0] == b'\x89'
False
>>> type(buf[0])
<class 'int'>
>>> type(buf[0:1])
<class 'bytes'>
>>> type(0x89)
<class 'int'>
Update the PNG matcher to match all 8 bytes of the PNG signature as in:
https://www.w3.org/TR/png/#5PNG-file-signature
I'm not sure that this makes any functional difference but spotted a discrepancy and thought I'd open this PR for completeness.
As an aside, it looks like the APNG matcher matches against an identical
bytearray
to begin with - is there a reason for this difference in the two matchers?