-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training seem to crash occasionally #3
Comments
I've added a temporary fix for this, which essentially catches when this happens, and restarts training from the previous state, keeping all model history and whatnot. Need a proper fix for this in sapai/sapai-gym. |
As I assumed all errors were coming from sapai-gym, I added a fix to catch all errors happening there: However, to my surprise, when running a regular training (now without the try/except loop in the main training script
|
Random Exception seem to happen after training thousands of steps:
What is causing this? |
When training RL models using sapai-gym, different errors tend to occur.
I have tried to uses try-expect blocks, but the problem is that if this happens, training using standard baseline 3 crashes, and we will have to start all over again.
We should therefore either: 1) fix what is bugged in sapai/sapai-gym or 2) add a wrapper function that catches when this fails, and tries to generate a new one (if possible).
The text was updated successfully, but these errors were encountered: