Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative bytecode_pattern approach #56

Open
A60AB5450353F40E opened this issue Mar 10, 2023 · 3 comments
Open

Alternative bytecode_pattern approach #56

A60AB5450353F40E opened this issue Mar 10, 2023 · 3 comments

Comments

@A60AB5450353F40E
Copy link

So I was thinking to split how to store redeem script, how about split it to 3 fields:

  • redeem script pattern, computed by replacing each sequence of pushes with number of pushes, encoded as script number push
  • sequence of push sizes: just the sequence of push sizes encoded as script number pushes
  • sequence of pushes: just the pushes

One can then use _eq operator on the most general pattern, which should be better performance than regex, it would then be further narrowed down by using regex on push sizes or pushes, but those would be executed only on positive matches for the general template.
Also, the redeem script can be accurately reconstructed from this.

Could even do some more parsing and have a function to filter for the exact value of Nth push or something.

@A60AB5450353F40E
Copy link
Author

A60AB5450353F40E commented Mar 14, 2023

I made a little tool to experiment with this, using these modes:

  • STRIP_PUSHES - replace each succesive sequence of pushes with number of pushes, encoded as a script number
  • STRIP_PUSH_DATA - replace each push with payload size, encoded as a script number
  • EXTRACT_PUSHES - ignore all executable bytes, extract full pushes

Example of patternizing AnyHedge input script:

STRIP_PUSHES:    56; len=1
STRIP_PUSH_DATA: 01406001406051025401; len=10
EXTRACT_PUSHES:  40da2963cc172e7dccf9570ebd272c496d9df459f1b4d07f1961515642054db764f25c4aab947a4dbcf7793ca25bcc5a46faa83d93e7aeaa2e23a95376e386902d10020aa262ab330100863301002b45000040c0df593545220c40b8676d56388b58715b27cc0fc5de3640dfd49aa05d00e3efc5597b9d8dd783f055b0579630157599290d547e9e3c160d2170d2c300177a72103e0aa262ac3301008733010026450000514d5401043e0aa262041209a26203eab30202e53303bc6d390500743ba40b2102d3c1de9d4bc77d6c3608cbe44d10138c7488e592dc2b1e10a6cf0e92c2ecb0471976a91415d54e1b90d806548f34263afe71695a8a19716388ac1976a914ec4bebcc7842bdc802880e8692d83e9ec41b95d688ac51210374059eb4b0edb9052779a1ef93c76d1715473a9f3d6634135a3c03b82561a015210396dc6749e3bde2c230fb10bb66a444d83318521c449181a6be57123c1257a6575c79009c637b695c7a7cad5b7a7cad6d6d6d6d6d51675c7a519dc3519d5f7a5f795779bb5d7a5d79577abb5c79587f77547f75817600a0695c79587f77547f75818c9d5c7a547f75815b799f695b795c7f77817600a0695979a35879a45c7a547f7581765c7aa2695b7aa2785a7a8b5b7aa5919b6902220276587a537a96a47c577a527994a4c4529d00cc7b9d00cd557a8851cc9d51cd547a8777777768; len=508

Entering the redeem script, we get:

BYTECODE:        043e0aa262041209a26203eab30202e53303bc6d390500743ba40b2102d3c1de9d4bc77d6c3608cbe44d10138c7488e592dc2b1e10a6cf0e92c2ecb0471976a91415d54e1b90d806548f34263afe71695a8a19716388ac1976a914ec4bebcc7842bdc802880e8692d83e9ec41b95d688ac51210374059eb4b0edb9052779a1ef93c76d1715473a9f3d6634135a3c03b82561a015210396dc6749e3bde2c230fb10bb66a444d83318521c449181a6be57123c1257a6575c79009c637b695c7a7cad5b7a7cad6d6d6d6d6d51675c7a519dc3519d5f7a5f795779bb5d7a5d79577abb5c79587f77547f75817600a0695c79587f77547f75818c9d5c7a547f75815b799f695b795c7f77817600a0695979a35879a45c7a547f7581765c7aa2695b7aa2785a7a8b5b7aa5919b6902220276587a537a96a47c577a527994a4c4529d00cc7b9d00cd557a8851cc9d51cd547a8777777768; len=340
STRIP_PUSHES:    5d79519c637b69517a7cad517a7cad6d6d6d6d6d5167517a519dc3519d517a51795179bb517a5179517abb5179517f77517f75817651a0695179517f77517f75818c9d517a517f758151799f695179517f77817651a0695179a35179a4517a517f758176517aa269517aa278517a8b517aa5919b695176517a517a96a47c517a517994a4c4519d51cc7b9d51cd517a8851cc9d51cd517a8777777768; len=156
STRIP_PUSH_DATA: 54545352535501210119011951012101215179519c637b69517a7cad517a7cad6d6d6d6d6d5167517a519dc3519d517a51795179bb517a5179517abb5179517f77517f75817651a0695179517f77517f75818c9d517a517f758151799f695179517f77817651a0695179a35179a4517a517f758176517aa269517aa278517a8b517aa5919b695276517a517a96a47c517a517994a4c4519d51cc7b9d51cd517a8851cc9d51cd517a8777777768; len=173
EXTRACT_PUSHES:  043e0aa262041209a26203eab30202e53303bc6d390500743ba40b2102d3c1de9d4bc77d6c3608cbe44d10138c7488e592dc2b1e10a6cf0e92c2ecb0471976a91415d54e1b90d806548f34263afe71695a8a19716388ac1976a914ec4bebcc7842bdc802880e8692d83e9ec41b95d688ac51210374059eb4b0edb9052779a1ef93c76d1715473a9f3d6634135a3c03b82561a015210396dc6749e3bde2c230fb10bb66a444d83318521c449181a6be57123c1257a6575c005c5b515c51515f5f575d5d575c5854005c58545c545b5b5c0059585c545c5b5a5b0222025853575252000055515154; len=231

@A60AB5450353F40E
Copy link
Author

To better illustrate, here's an index (pattern, input_count) of contract fingerprints (STRIP_PUSHES mode) from blocks 0-780,000:

https://gist.github.com/A60AB5450353F40E/6b3e525d6e1220328217b9568968d6fc

@bitjson
Copy link
Member

bitjson commented Nov 21, 2023

Thanks for looking into this @A60AB5450353F40E!

This would be a great improvement for scanning contract patterns. I'd love to take a PR introducing this feature! I won't have bandwidth to work on this myself until I make some progress on #29. (Otherwise, I'll try to implement the bytecode_pattern stuff this way when I'm working on the ClickHouse migration.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants