Block production stops after arbitrary idle period #891

knikos · 2024-05-08T07:34:23Z

knikos
May 8, 2024

Hello community, while working with the example vms I have noticed that block production stops after some time (arbitrary), when the chain is idle. Assuming no txs being issued and while watching the chain using the cli tools, I notice that it stops producing empty blocks. If I then issue a tx (simple transfer for example), block production resumes again.

Is this behavior intended for saving resources? If yes, is it documented somewhere or could point me to the code block responsible for it?

patrick-ogrady · 2024-05-23T16:09:48Z

patrick-ogrady
May 23, 2024

This sounds like a bug to me. We are working on a number of changes during the Vryx work that will replace many of the existing mechanisms. I would say this will be left strategically unfixed with the expectation the Vryx work will be much better tested/stable.

0 replies

kpachhai · 2024-05-28T17:17:50Z

kpachhai
May 28, 2024

We're having a similar issue. This is what we did.

We deployed our subnet to devnet on AWS using avalanche cli. It's working great. However, for some reason, after it runs for a few days, all the nodes stop working for some reason. In other words, the EC2 instances are still running however, it looks like hypervms in these nodes basically stop producing new blocks. We are running our vm based on v0.0.16 stable release of hypersdk.

On grafana dashboards, the logs from these validator nodes stop showing up completely. The machines are still up as I can ssh into them though. And the API server is also running and I can query but the height seems to be the same as there are no new blocks.

In our case however, even if we issue a simple tx, the block production never resumes.

0 replies

kpachhai · 2024-06-04T18:47:31Z

kpachhai
Jun 4, 2024

I noticed something peculiar while running our own devnet. As @knikos mentioned, if the node doesn't execute a transaction in a while, it seems to stall. This is true but there may be more to this issue.

So, I ran our devnet with 5 nodes. I kept sending transaction basically every minute and the chain didn't stall for a long time and the height kept growing which was nice but eventually it stalled. At the time of when it stalled, I checked the current height by querying the RPC node of each of the 5 nodes. About 3 nodes were way behind. For instance, node1 and node2 returned height 12000 but node3, node4 and node5 returned height 9000 and 10000. These are not exact numbers but only rough estimation.

So, then I redeployed our devnet again with 5 nodes. This time, instead of using one node to send transaction, I made it so that I executed 5 transactions using different nodes RPCs. After I did this, the chain has not stalled and all the nodes are at the same height and everything is working normally. So, to fix the issue, I had to run my cronjob by sending 5 different transactions using 5 different nodes RPCs. So far, it's running smoothly and we're already at 40,000 height.

Not sure if the issue of the chain stalling was occurring because each node needed to be active at all times as evidenced by the execution of transactions using each of the nodes RPCs. But it's definitely worth looking into this issue because otherwise, eventually, the nodes will be out of sync and the chain will stall.

Anyways, thought I would add my own findings here.

1 reply

patrick-ogrady Jun 5, 2024

Thanks for taking the time to write up your findings.

We'll make sure to test the HyperSDK for a prolonged period of time with no activity before we communicate that it is "Devnet Ready" (planned for later this year).

kpachhai · 2024-06-21T11:38:35Z

kpachhai
Jun 21, 2024

So I have been tweaking different settings in the configuration files to see if that would have an effect.

I tried a lot of things but recently, I changed the value of 'proposerMinBlockDelay' from 0 to 250 and everything has been running great with no issue. The amazing thing is I don't even need to keep sending any transactions to any node to keep the nodes alive. I will continue to monitor this but I think this has solved the issue of when the node stops producing blocks and/or can't keep up with producing new blocks so quickly and so the chain doesn't get stalled and continues running smoothly.

Makes sense because the issue may be related to how long a node has to produce new blocks and sometimes, for whatever reason, maybe some nodes can't keep up with production of new blocks. And since I was only running 5 nodes, even if one node were to be left behind, the entire chain would get stalled because each node has 20% weight on the network.

This also explains why if were to run 10 nodes for my devnet, at least 2 nodes would have to be left behind before the chain got stalled so it always took longer for my devnet to get halted.

Anyways, hope my findings help you guys further!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block production stops after arbitrary idle period #891

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Block production stops after arbitrary idle period #891

knikos May 8, 2024

Replies: 4 comments · 1 reply

patrick-ogrady May 23, 2024

kpachhai May 28, 2024

kpachhai Jun 4, 2024

patrick-ogrady Jun 5, 2024

kpachhai Jun 21, 2024

knikos
May 8, 2024

Replies: 4 comments 1 reply

patrick-ogrady
May 23, 2024

kpachhai
May 28, 2024

kpachhai
Jun 4, 2024

kpachhai
Jun 21, 2024