Handle examples originated from the same board state #115

evg-tyurin · 2019-02-10T10:18:41Z

evg-tyurin
Feb 10, 2019

During self-play phase we usually collect different examples for the same board states. Should we preprocess such examples before optimizing the NNet? In the current implementation, we don't preprocess them so we train NNet and expect different output from the same input values. I think this may be wrong.

suragnair · 2019-02-17T09:54:56Z

suragnair
Feb 17, 2019
Maintainer

I believe that having different labels for the same state has an effect such that the output gets averaged out. For example if one state has labels 1 and and -1, it will be pushed towards 0 instead.

I don’t think the current implementation suffers from any issue as such. But do let me know if you notice any improvements by preprocessing.

0 replies

evg-tyurin · 2019-02-19T07:12:17Z

evg-tyurin
Feb 19, 2019
Author

Shouldn't we aim to get from NNet predicted reward values (v) as integers from fixed set of possible game results? Can it be arbitrary averaged float values from -1 to 1?

0 replies

alreadydone · 2019-02-19T07:23:10Z

alreadydone
Feb 19, 2019

The NN predicts a winning probability in [0,1], or the expected reward in [-1,1].

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle examples originated from the same board state #115

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Handle examples originated from the same board state #115

evg-tyurin Feb 10, 2019

Replies: 3 comments

suragnair Feb 17, 2019 Maintainer

evg-tyurin Feb 19, 2019 Author

alreadydone Feb 19, 2019

evg-tyurin
Feb 10, 2019

suragnair
Feb 17, 2019
Maintainer

evg-tyurin
Feb 19, 2019
Author

alreadydone
Feb 19, 2019