Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behaviour after continuing training after crash #24

Open
andreped opened this issue Aug 21, 2022 · 2 comments
Open

Strange behaviour after continuing training after crash #24

andreped opened this issue Aug 21, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@andreped
Copy link
Owner

andreped commented Aug 21, 2022

After Exception happens, for whatever reason, the ep_raw_mean and ep_len_mean are much higher than usual. Are we properly reseting the environment before restarting the training? Or is there a more important issue? Perhaps the opponents are reset to their poorest state, meaning that after restarting we are playing against easier opponents?

Note that after a while it goes down to a similar level that it was before the crash.

This is the prompt that i got around the Exception:

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 72.6       |
|    ep_rew_mean          | 13.8       |
| time/                   |            |
|    fps                  | 105        |
|    iterations           | 300        |
|    time_elapsed         | 5833       |
|    total_timesteps      | 614400     |
| train/                  |            |
|    approx_kl            | 0.06014028 |
|    clip_fraction        | 0.185      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.407     |
|    explained_variance   | 0.6        |
|    learning_rate        | 0.0003     |
|    loss                 | 1.53       |
|    n_updates            | 48400      |
|    policy_gradient_loss | -0.0404    |
|    value_loss           | 9.12       |
----------------------------------------
Exception: get_idx < pet-hedgehog 10-1 status-honey-bee 2-1 > not found
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 92.7        |
|    ep_rew_mean          | 33.1        |
| time/                   |             |
|    fps                  | 123         |
|    iterations           | 1           |
|    time_elapsed         | 16          |
|    total_timesteps      | 2048        |
| train/                  |             |
|    approx_kl            | 0.043567862 |
|    clip_fraction        | 0.208       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.48       |
|    explained_variance   | 0.68        |
|    learning_rate        | 0.0003      |
|    loss                 | 5.09        |
|    n_updates            | 48410       |
|    policy_gradient_loss | -0.0425     |
|    value_loss           | 9.02        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 83.9        |
|    ep_rew_mean          | 24.1        |
| time/                   |             |
|    fps                  | 116         |
|    iterations           | 2           |
|    time_elapsed         | 35          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.041116748 |
|    clip_fraction        | 0.179       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.366      |
|    explained_variance   | 0.631       |
|    learning_rate        | 0.0003      |
|    loss                 | 5.58        |
|    n_updates            | 48420       |
|    policy_gradient_loss | -0.0446     |
|    value_loss           | 16.3        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 83          |
|    ep_rew_mean          | 21.3        |
| time/                   |             |
|    fps                  | 114         |
|    iterations           | 3           |
|    time_elapsed         | 53          |
|    total_timesteps      | 6144        |
| train/                  |             |
|    approx_kl            | 0.056923926 |
|    clip_fraction        | 0.187       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.404      |
|    explained_variance   | 0.585       |
|    learning_rate        | 0.0003      |
|    loss                 | 3.17        |
|    n_updates            | 48430       |
|    policy_gradient_loss | -0.043      |
|    value_loss           | 12.3        |
-----------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 83.6       |
|    ep_rew_mean          | 21.4       |
| time/                   |            |
|    fps                  | 112        |
|    iterations           | 4          |
|    time_elapsed         | 72         |
|    total_timesteps      | 8192       |
| train/                  |            |
|    approx_kl            | 0.05702912 |
|    clip_fraction        | 0.185      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.42      |
|    explained_variance   | 0.476      |
|    learning_rate        | 0.0003     |
|    loss                 | 3.65       |
|    n_updates            | 48440      |
|    policy_gradient_loss | -0.0412    |
|    value_loss           | 14.4       |
----------------------------------------
---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 77.3      |
|    ep_rew_mean          | 16.2      |
| time/                   |           |
|    fps                  | 112       |
|    iterations           | 5         |
|    time_elapsed         | 91        |
|    total_timesteps      | 10240     |
| train/                  |           |
|    approx_kl            | 0.0575137 |
|    clip_fraction        | 0.185     |
|    clip_range           | 0.2       |
|    entropy_loss         | -0.396    |
|    explained_variance   | 0.469     |
|    learning_rate        | 0.0003    |
|    loss                 | 4.42      |
|    n_updates            | 48450     |
|    policy_gradient_loss | -0.0437   |
|    value_loss           | 13.3      |
---------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 78.6        |
|    ep_rew_mean          | 17.3        |
| time/                   |             |
|    fps                  | 112         |
|    iterations           | 6           |
|    time_elapsed         | 109         |
|    total_timesteps      | 12288       |
| train/                  |             |
|    approx_kl            | 0.077249065 |
|    clip_fraction        | 0.219       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.47       |
|    explained_variance   | 0.525       |
|    learning_rate        | 0.0003      |
|    loss                 | 2.23        |
|    n_updates            | 48460       |
|    policy_gradient_loss | -0.0489     |
|    value_loss           | 9.37        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 78.3        |
|    ep_rew_mean          | 17.8        |
| time/                   |             |
|    fps                  | 112         |
|    iterations           | 7           |
|    time_elapsed         | 127         |
|    total_timesteps      | 14336       |
| train/                  |             |
|    approx_kl            | 0.048610996 |
|    clip_fraction        | 0.182       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.408      |
|    explained_variance   | 0.645       |
|    learning_rate        | 0.0003      |
|    loss                 | 2.42        |
|    n_updates            | 48470       |
|    policy_gradient_loss | -0.041      |
|    value_loss           | 11.6        |
-----------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 77.4       |
|    ep_rew_mean          | 17.6       |
| time/                   |            |
|    fps                  | 111        |
|    iterations           | 8          |
|    time_elapsed         | 147        |
|    total_timesteps      | 16384      |
| train/                  |            |
|    approx_kl            | 0.04007852 |
|    clip_fraction        | 0.202      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.448     |
|    explained_variance   | 0.687      |
|    learning_rate        | 0.0003     |
|    loss                 | 2.07       |
|    n_updates            | 48480      |
|    policy_gradient_loss | -0.0456    |
|    value_loss           | 12         |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 76.6        |
|    ep_rew_mean          | 17.3        |
| time/                   |             |
|    fps                  | 111         |
|    iterations           | 9           |
|    time_elapsed         | 165         |
|    total_timesteps      | 18432       |
| train/                  |             |
|    approx_kl            | 0.056452066 |
|    clip_fraction        | 0.186       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.419      |
|    explained_variance   | 0.539       |
|    learning_rate        | 0.0003      |
|    loss                 | 5.39        |
|    n_updates            | 48490       |
|    policy_gradient_loss | -0.043      |
|    value_loss           | 17.7        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 75.7        |
|    ep_rew_mean          | 17.1        |
| time/                   |             |
|    fps                  | 110         |
|    iterations           | 10          |
|    time_elapsed         | 184         |
|    total_timesteps      | 20480       |
| train/                  |             |
|    approx_kl            | 0.043880884 |
|    clip_fraction        | 0.188       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.453      |
|    explained_variance   | 0.543       |
|    learning_rate        | 0.0003      |
|    loss                 | 6.02        |
|    n_updates            | 48500       |
|    policy_gradient_loss | -0.0444     |
|    value_loss           | 14.9        |
-----------------------------------------

@andreped andreped added the bug Something isn't working label Aug 21, 2022
@GoldExplosion
Copy link
Contributor

ep_rew_mean and ep_len_mean are calculated by averaging corresponding values over the last 100 episodes. These information is stored in a buffer (ep_info_buffer). When you restart the training it is possible that the buffer is emptied and thus they are not the true mean over 100 episodes.

@andreped
Copy link
Owner Author

andreped commented Aug 21, 2022

Emptying the buffer seems like the correct thing to do, as we start a completely new training and have no real control of the interaction between these, unless we handle this ourselves.

A much better solution would be if the training just worked - that we could train forever without any errors.

There seem to be issues at multiple levels, and quite challenging to debug what is causing all these, but I would presume there is something wrong in sapai-gym. However, I added a method to catch if any errors happened inside the step method in sapai-gym: andreped/sapai-gym@7443f36

To my surprise I still got some errors, so I believe there might be something wrong in the sapai engine. Not sure really.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

2 participants