Handle examples originated from the same board state #115
Replies: 3 comments
-
I believe that having different labels for the same state has an effect such that the output gets averaged out. For example if one state has labels 1 and and -1, it will be pushed towards 0 instead. I don’t think the current implementation suffers from any issue as such. But do let me know if you notice any improvements by preprocessing. |
Beta Was this translation helpful? Give feedback.
-
Shouldn't we aim to get from NNet predicted reward values (v) as integers from fixed set of possible game results? Can it be arbitrary averaged float values from -1 to 1? |
Beta Was this translation helpful? Give feedback.
-
The NN predicts a winning probability in [0,1], or the expected reward in [-1,1]. |
Beta Was this translation helpful? Give feedback.
-
During self-play phase we usually collect different examples for the same board states. Should we preprocess such examples before optimizing the NNet? In the current implementation, we don't preprocess them so we train NNet and expect different output from the same input values. I think this may be wrong.
Beta Was this translation helpful? Give feedback.
All reactions