Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong result. #2

Open
zengqg opened this issue Nov 2, 2018 · 2 comments
Open

Wrong result. #2

zengqg opened this issue Nov 2, 2018 · 2 comments

Comments

@zengqg
Copy link

zengqg commented Nov 2, 2018

When I run
python main.py --train_pg

the reward or Avg.reward is negative. What' s wrong with it?

@huangjicun
Copy link
Collaborator

The calculation of reward is the score of the training model minus the score of the AI.
The model is weak at the beginning of the training, so the score will be less than the AI and cause the reward to be negative.
When the training model becomes stronger by training, reward and Avg.reward will become positive.

@JasonYao81000
Copy link
Owner

An episode ends when one of the players reach 21 points in pong.
Our reward is defined as:
Reward= Nwin - Nlose
Nwin is the number of wining rounds in one episode.
Nlose is the number of losing rounds in one episode.

When you run python main.py --train_pg, you will start to train a policy gradient model with "noob" weights.
So you will get Nwin = 0 and Nlose = 21 at the begining.
Thus, the reward is -21, which is negative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants