Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM Crashes on Juno Pod After Restart During Heavy Load #1832

Open
wojciechos opened this issue Apr 19, 2024 · 0 comments
Open

OOM Crashes on Juno Pod After Restart During Heavy Load #1832

wojciechos opened this issue Apr 19, 2024 · 0 comments

Comments

@wojciechos
Copy link
Contributor

wojciechos commented Apr 19, 2024

Increased traffic targeting the starknet_call method on our k8s pod pushed CPU usage to 100%, leading to request failures and block sync issues. Subsequent restarts of the pod resulted in immediate OOM errors at startup. However, after applying a fresh database, the pod started to sync properly without any OOM issues which suggests that db has been corrupted(?).

image
k8s Logs:

terminated
Reason: OOMKilled - exit code: 137
Started at: 2024-04-19T15:14:04+05:30
Finished at: 2024-04-19T15:14:51+05:30

Possible Causes:

  • Potential database corruption during restarts combined with high CPU load.
  • Recent Pebble updates

//UPDATE - 06.05.2024
Pod unable to keep up with syncing, resulting in failed requests due to reaching CPU limit.
Actions taken: Added more pods, restarted pod, but no improvement.
Resolution: Removing and replacing the DB resolved the issue.
Next steps: Prioritize investigating and fixing the underlying cause.

06-05-2024-incident.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant