Strange behaviour after continuing training after crash #24

andreped · 2022-08-21T10:49:16Z

After Exception happens, for whatever reason, the ep_raw_mean and ep_len_mean are much higher than usual. Are we properly reseting the environment before restarting the training? Or is there a more important issue? Perhaps the opponents are reset to their poorest state, meaning that after restarting we are playing against easier opponents?

Note that after a while it goes down to a similar level that it was before the crash.

This is the prompt that i got around the Exception:

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 72.6       |
|    ep_rew_mean          | 13.8       |
| time/                   |            |
|    fps                  | 105        |
|    iterations           | 300        |
|    time_elapsed         | 5833       |
|    total_timesteps      | 614400     |
| train/                  |            |
|    approx_kl            | 0.06014028 |
|    clip_fraction        | 0.185      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.407     |
|    explained_variance   | 0.6        |
|    learning_rate        | 0.0003     |
|    loss                 | 1.53       |
|    n_updates            | 48400      |
|    policy_gradient_loss | -0.0404    |
|    value_loss           | 9.12       |
----------------------------------------
Exception: get_idx < pet-hedgehog 10-1 status-honey-bee 2-1 > not found
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 92.7        |
|    ep_rew_mean          | 33.1        |
| time/                   |             |
|    fps                  | 123         |
|    iterations           | 1           |
|    time_elapsed         | 16          |
|    total_timesteps      | 2048        |
| train/                  |             |
|    approx_kl            | 0.043567862 |
|    clip_fraction        | 0.208       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.48       |
|    explained_variance   | 0.68        |
|    learning_rate        | 0.0003      |
|    loss                 | 5.09        |
|    n_updates            | 48410       |
|    policy_gradient_loss | -0.0425     |
|    value_loss           | 9.02        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 83.9        |
|    ep_rew_mean          | 24.1        |
| time/                   |             |
|    fps                  | 116         |
|    iterations           | 2           |
|    time_elapsed         | 35          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.041116748 |
|    clip_fraction        | 0.179       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.366      |
|    explained_variance   | 0.631       |
|    learning_rate        | 0.0003      |
|    loss                 | 5.58        |
|    n_updates            | 48420       |
|    policy_gradient_loss | -0.0446     |
|    value_loss           | 16.3        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 83          |
|    ep_rew_mean          | 21.3        |
| time/                   |             |
|    fps                  | 114         |
|    iterations           | 3           |
|    time_elapsed         | 53          |
|    total_timesteps      | 6144        |
| train/                  |             |
|    approx_kl            | 0.056923926 |
|    clip_fraction        | 0.187       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.404      |
|    explained_variance   | 0.585       |
|    learning_rate        | 0.0003      |
|    loss                 | 3.17        |
|    n_updates            | 48430       |
|    policy_gradient_loss | -0.043      |
|    value_loss           | 12.3        |
-----------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 83.6       |
|    ep_rew_mean          | 21.4       |
| time/                   |            |
|    fps                  | 112        |
|    iterations           | 4          |
|    time_elapsed         | 72         |
|    total_timesteps      | 8192       |
| train/                  |            |
|    approx_kl            | 0.05702912 |
|    clip_fraction        | 0.185      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.42      |
|    explained_variance   | 0.476      |
|    learning_rate        | 0.0003     |
|    loss                 | 3.65       |
|    n_updates            | 48440      |
|    policy_gradient_loss | -0.0412    |
|    value_loss           | 14.4       |
----------------------------------------
---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 77.3      |
|    ep_rew_mean          | 16.2      |
| time/                   |           |
|    fps                  | 112       |
|    iterations           | 5         |
|    time_elapsed         | 91        |
|    total_timesteps      | 10240     |
| train/                  |           |
|    approx_kl            | 0.0575137 |
|    clip_fraction        | 0.185     |
|    clip_range           | 0.2       |
|    entropy_loss         | -0.396    |
|    explained_variance   | 0.469     |
|    learning_rate        | 0.0003    |
|    loss                 | 4.42      |
|    n_updates            | 48450     |
|    policy_gradient_loss | -0.0437   |
|    value_loss           | 13.3      |
---------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 78.6        |
|    ep_rew_mean          | 17.3        |
| time/                   |             |
|    fps                  | 112         |
|    iterations           | 6           |
|    time_elapsed         | 109         |
|    total_timesteps      | 12288       |
| train/                  |             |
|    approx_kl            | 0.077249065 |
|    clip_fraction        | 0.219       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.47       |
|    explained_variance   | 0.525       |
|    learning_rate        | 0.0003      |
|    loss                 | 2.23        |
|    n_updates            | 48460       |
|    policy_gradient_loss | -0.0489     |
|    value_loss           | 9.37        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 78.3        |
|    ep_rew_mean          | 17.8        |
| time/                   |             |
|    fps                  | 112         |
|    iterations           | 7           |
|    time_elapsed         | 127         |
|    total_timesteps      | 14336       |
| train/                  |             |
|    approx_kl            | 0.048610996 |
|    clip_fraction        | 0.182       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.408      |
|    explained_variance   | 0.645       |
|    learning_rate        | 0.0003      |
|    loss                 | 2.42        |
|    n_updates            | 48470       |
|    policy_gradient_loss | -0.041      |
|    value_loss           | 11.6        |
-----------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 77.4       |
|    ep_rew_mean          | 17.6       |
| time/                   |            |
|    fps                  | 111        |
|    iterations           | 8          |
|    time_elapsed         | 147        |
|    total_timesteps      | 16384      |
| train/                  |            |
|    approx_kl            | 0.04007852 |
|    clip_fraction        | 0.202      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.448     |
|    explained_variance   | 0.687      |
|    learning_rate        | 0.0003     |
|    loss                 | 2.07       |
|    n_updates            | 48480      |
|    policy_gradient_loss | -0.0456    |
|    value_loss           | 12         |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 76.6        |
|    ep_rew_mean          | 17.3        |
| time/                   |             |
|    fps                  | 111         |
|    iterations           | 9           |
|    time_elapsed         | 165         |
|    total_timesteps      | 18432       |
| train/                  |             |
|    approx_kl            | 0.056452066 |
|    clip_fraction        | 0.186       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.419      |
|    explained_variance   | 0.539       |
|    learning_rate        | 0.0003      |
|    loss                 | 5.39        |
|    n_updates            | 48490       |
|    policy_gradient_loss | -0.043      |
|    value_loss           | 17.7        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 75.7        |
|    ep_rew_mean          | 17.1        |
| time/                   |             |
|    fps                  | 110         |
|    iterations           | 10          |
|    time_elapsed         | 184         |
|    total_timesteps      | 20480       |
| train/                  |             |
|    approx_kl            | 0.043880884 |
|    clip_fraction        | 0.188       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.453      |
|    explained_variance   | 0.543       |
|    learning_rate        | 0.0003      |
|    loss                 | 6.02        |
|    n_updates            | 48500       |
|    policy_gradient_loss | -0.0444     |
|    value_loss           | 14.9        |
-----------------------------------------

The text was updated successfully, but these errors were encountered:

GoldExplosion · 2022-08-21T13:57:19Z

ep_rew_mean and ep_len_mean are calculated by averaging corresponding values over the last 100 episodes. These information is stored in a buffer (ep_info_buffer). When you restart the training it is possible that the buffer is emptied and thus they are not the true mean over 100 episodes.

andreped · 2022-08-21T15:15:00Z

Emptying the buffer seems like the correct thing to do, as we start a completely new training and have no real control of the interaction between these, unless we handle this ourselves.

A much better solution would be if the training just worked - that we could train forever without any errors.

There seem to be issues at multiple levels, and quite challenging to debug what is causing all these, but I would presume there is something wrong in sapai-gym. However, I added a method to catch if any errors happened inside the step method in sapai-gym: andreped/sapai-gym@7443f36

To my surprise I still got some errors, so I believe there might be something wrong in the sapai engine. Not sure really.

andreped added the bug Something isn't working label Aug 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange behaviour after continuing training after crash #24

Strange behaviour after continuing training after crash #24

andreped commented Aug 21, 2022 •

edited

Loading

GoldExplosion commented Aug 21, 2022

andreped commented Aug 21, 2022 •

edited

Loading

Strange behaviour after continuing training after crash #24

Strange behaviour after continuing training after crash #24

Comments

andreped commented Aug 21, 2022 • edited Loading

GoldExplosion commented Aug 21, 2022

andreped commented Aug 21, 2022 • edited Loading

andreped commented Aug 21, 2022 •

edited

Loading

andreped commented Aug 21, 2022 •

edited

Loading