RSK stuck in handling debug_traceTransaction RPC calls #1692

gofman8 · 2022-01-13T15:20:40Z

RSK node stuck during processing RPC debug_traceTransaction call. Example of request:

curl -s -X POST -H "Content-Type: application/json" -d '{ "jsonrpc": "2.0", "id": 1772093265, "method": "debug_traceTransaction", "params": [ "0xdb76b62bae3896c47fba80037f0434f3cd456218b54fc963f85e15ca0c111def", { "disableStorage": true, "disableMemory": false, "disableStack": false, "fullStorage": false } ], "origin": "http://remix.ethereum.org", "tabId": 130 }' https://mainnetdebug.sovryn.app >> /dev/null

This request works fine for some time (from few hours to days), but after time pass requests stuck on receiving response from RSK node port 4444 (rpc). Each of this stuck request left non-closed TCP session on 4444 socket.
output of netstat -anp | grep :4444

In this time all other requests works fine (like eth_blockNumber, web3_clientVersion etc)
But when number of open connections in CLOSE_WAIT state reaches 28. RSK stop to process any RPC requests:

So looks like RSK node stop to close socket properly during processing debug_traceTransaction RPC function which in some moment resulting in a fully nonfunctional RPC.

After RSK node restart is able to process RPC calls again.

With TRACE logs enabled for "execute,blockexecutor,web3,jsonrpc" loggers. We collected logs for the whole day and found that sometimes during processing debug_traceTransaction, it's stuck before trace log message "Go transaction 390" and "Finalize transaction 390" .

Log file for 2022-01-11 attached:
rsklog.tar.gz

cat rsk.log.txt | grep "Go transaction 390"  | wc -l 
833

cat rsk.log.txt | grep "Finalize transaction 390"  | wc -l 
737

So the last trace is here:

rskj/rskj-core/src/main/java/org/ethereum/core/TransactionExecutor.java

Line 415 in 58ce04c

logger.trace("Go transaction {} {}", toBI(tx.getNonce()), tx.getHash());

Than this RPC call closed by timeout on client side and socket keep connection on 4444 port (rpc) in CLOSE_WAIT tcp session state.

Reproducing this issue is quite easy but took some time of repetitively execution of debug_traceTransaction RPC call or using Debugger in Remix.

Platform: AWS EC2
OS: Ubuntu 18.04, kernel 5.4.0-1059-aws
Instance type: m5.xlarge
RSK Network: Mainnet
RSK node configs:
logback.xml.txt
node.conf.txt

Thanks in advance and lemme know if I can assist somehow.

The text was updated successfully, but these errors were encountered:

shlumper · 2022-01-13T16:37:25Z

would be nice to know which java classes we might want to enable debug level logs to have more info in the logs in regards.

nagarev · 2022-01-24T19:19:45Z

Hi guys! I'm seeing that you're setting some tracing options on your request:

{ "disableStorage": true, "disableMemory": false, "disableStack": false, "fullStorage": false }

For now, this options are being ignored, but we have a PR that will be merged in a near future where we start supporting the following tracing options: disableStorage, disableMemory and disableStack. This can be a solution I think, given that you're disabling storage, it could happen that a smaller resulting request could help avoiding this hangups.

Besides all of that, I will do a couple of tests on my side, and will keep you posted on my findings.

Thanks for reporting this!

nagarev · 2022-01-27T16:46:15Z

Hi guys! I have a couple of questions regarding the kind of processing you do to help me reproduce the problem:

I think that you're performing the request for different transactions... does this process end up somewhere in time or the idea is for it to continue indefinitely?
Are you doing some processing to each response? If yes, do you wait this processing to end before taking another transaction and make a new request? How much do that processing take (in average)? If there's no processing, do you have an estimate of how much time passes between two different requests?
Regarding "Reproducing this issue is quite easy but took some time of repetitively execution of debug_traceTransaction RPC call or using Debugger in Remix." can you explain a little how this test is performed? Do you use the same transaction for each request and repeat? If not, does the processing stops always on the same transaction?

Thanks a lot in advance!

gofman8 · 2022-01-28T11:45:56Z

Hi! Many thanks for your reply!

Regarding "Reproducing this issue is quite easy but took some time of repetitively execution of debug_traceTransaction RPC call or using Debugger in Remix." can you explain a little how this test is performed? Do you use the same transaction for each request and repeat? If not, does the processing stops always on the same transaction?."

To reproduce this issue it's enough to repeat the same request. It's stop in few hours or days. I didn't found any logic here.

I think that you're performing the request for different transactions... does this process end up somewhere in time or the idea is for it to continue indefinitely?."

No, the same transactions. It's just continues indefinitely (http timeout error)

*Are you doing some processing to each response? If yes, do you wait this processing to end before taking another transaction and make a new request? How much do that processing take (in average)? If there's no processing, do you have an estimate of how much time passes between two different requests?"

We have tried to repeat same request per 30 seconds and one request per second, in any case RSK node stuck in some time.
You can use simple script to reproduce this issue:
while sleep 1; do curl -s -X POST -H "Content-Type: application/json" -d '{ "jsonrpc": "2.0", "id": 1772093265, "method": "debug_traceTransaction", "params": [ "0xdb76b62bae3896c47fba80037f0434f3cd456218b54fc963f85e15ca0c111def", { "disableStorage": true, "disableMemory": false, "disableStack": false, "fullStorage": false } ], "origin": "http://remix.ethereum.org", "tabId": 130 }' https://mainnetdebug.sovryn.app >> /dev/null; done
and just leave the RSK node for one day, I'm sure that time will be enough to catch this behaviour.

RSK version v3.1.0

shlumper · 2022-02-21T11:27:14Z

Hi nagarev,
any update on this?
thanks

shlumper · 2022-02-24T14:24:00Z

maybe today :) ?

nagarev · 2022-03-03T15:17:14Z

Hi guys! Sorry for the delay! Yes, I've been running different tests and I've found a couple of insights regarding this, but I'm trying to figure out it completely before drawing any conclusion. In the meantime, have you guys performed some other test regarding that?

shlumper · 2022-03-21T14:32:27Z

no just the one described above. thnaks

gofman8 · 2022-03-25T17:30:28Z

Hi @nagarev !
We discovered that the same issue occurs after rsk process restart...
When web3.eth.getBlockNumber calls started to work, debug transactions are not. They are stuck with exactly the same behavior...

nagarev · 2022-05-12T15:59:21Z

Hi guys! sorry for the long delay!

I couldn't reproduce the issue, but I've done some research, please let me show you my results and conclusions:

First of all, I checked the response size to know how it changes depending on filters being set (yes, now we support some filters for the debug_traceTransaction and debug_traceBlock methods)

This are the results:

Request Type	Size (MB)
Full Response	315.5
Only Memory	3.2
Only Stack	5.0
Only Storage	308.0

So you can see the storage being the main contributor to the response size (97.6% of it).

Knowing this, I've performed some stress tests on the node (basically, I've performed this request alongside other really heavy ones over an over, and let it run for a couple of hours) but couldn't observe any hangups (was not using the filters).

Maybe you can try to reproduce it again using the new version (that will use the filters you're setting) and see what happens.

I don't think I've got that last problem you've reported, regarding the getBlockNumber, does this method starts to fail when the other is stucked?

gofman8 closed this as completed Jan 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RSK stuck in handling debug_traceTransaction RPC calls #1692

RSK stuck in handling debug_traceTransaction RPC calls #1692

gofman8 commented Jan 13, 2022

shlumper commented Jan 13, 2022

nagarev commented Jan 24, 2022

nagarev commented Jan 27, 2022

gofman8 commented Jan 28, 2022

shlumper commented Feb 21, 2022

shlumper commented Feb 24, 2022

nagarev commented Mar 3, 2022

shlumper commented Mar 21, 2022

gofman8 commented Mar 25, 2022

nagarev commented May 12, 2022

RSK stuck in handling debug_traceTransaction RPC calls #1692

RSK stuck in handling debug_traceTransaction RPC calls #1692

Comments

gofman8 commented Jan 13, 2022

shlumper commented Jan 13, 2022

nagarev commented Jan 24, 2022

nagarev commented Jan 27, 2022

gofman8 commented Jan 28, 2022

shlumper commented Feb 21, 2022

shlumper commented Feb 24, 2022

nagarev commented Mar 3, 2022

shlumper commented Mar 21, 2022

gofman8 commented Mar 25, 2022

nagarev commented May 12, 2022