Wrong result. #2

zengqg · 2018-11-02T10:19:47Z

When I run
python main.py --train_pg

the reward or Avg.reward is negative. What' s wrong with it?

huangjicun · 2018-11-03T05:10:30Z

The calculation of reward is the score of the training model minus the score of the AI.
The model is weak at the beginning of the training, so the score will be less than the AI and cause the reward to be negative.
When the training model becomes stronger by training, reward and Avg.reward will become positive.

JasonYao81000 · 2018-11-03T05:34:23Z

An episode ends when one of the players reach 21 points in pong.
Our reward is defined as:
Reward= N_win - N_lose
N_win is the number of wining rounds in one episode.
N_lose is the number of losing rounds in one episode.

When you run python main.py --train_pg, you will start to train a policy gradient model with "noob" weights.
So you will get N_win = 0 and N_lose = 21 at the begining.
Thus, the reward is -21, which is negative.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong result. #2

Wrong result. #2

zengqg commented Nov 2, 2018

huangjicun commented Nov 3, 2018

JasonYao81000 commented Nov 3, 2018

Wrong result. #2

Wrong result. #2

Comments

zengqg commented Nov 2, 2018

huangjicun commented Nov 3, 2018

JasonYao81000 commented Nov 3, 2018