Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: feature(pu): add Go env and AlphaZero league training #55

Closed
wants to merge 34 commits into from

Conversation

puyuan1996
Copy link
Collaborator

  • add go_env, related unittest
  • add go mcts bot and alphazero/muzero config
  • add league version of alphazero
  • add ctree version of alphazero

…option, fix norm_type in az prediction net, fix temperature in az_league
@puyuan1996 puyuan1996 added enhancement New feature or request environment New or improved environment labels Jul 21, 2023
if (action == -1) {
break;
}
simulate_env.attr("step")(action);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simulate_env 执行step后的board可以打印出来检查是否正确


while (!node->is_leaf()) {
int action;
std::tie(action, node) = _select_child(node, simulate_env);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

打印legal action查看每次select_child后是否输出正常。

std::vector<std::pair<int, int>> action_visits;
// std::cout << "position11 " << std::endl;
for (int action = 0; action < simulate_env.attr("action_space").attr("n").cast<int>(); ++action) {
if (root->children.count(action)) {
Copy link
Collaborator

@jayyoung0802 jayyoung0802 Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥是count不是std::find?

Node* parent;
float prior_p;
int visit_count;
float value_sum;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

统一用double?


int action = -1;
Node* child = nullptr;
double best_score = -9999999;
Copy link
Collaborator

@jayyoung0802 jayyoung0802 Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double 是浮点数 初始化为浮点数而不是负整数

// std::cout << "position8 " << std::endl;
_simulate(root, simulate_env_copy, policy_forward_fn);
// std::cout << "position9 " << std::endl;
simulate_env_copy = py::none();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这句话貌似多余了


while (!node->is_leaf()) {
int action;
std::tie(action, node) = _select_child(node, simulate_env);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_select_child的返回值是指针,这儿为啥还需要tie

@PaParaZz1 PaParaZz1 changed the title WIP: feature(pu): add Go env, mcts bot and related unittest, add league version and ctree version of alphazero WIP: feature(pu): add Go env and AlphaZero league training Aug 8, 2023
@puyuan1996
Copy link
Collaborator Author

We have a new polished PR.

@puyuan1996 puyuan1996 closed this Aug 8, 2023
@PaParaZz1 PaParaZz1 deleted the dev-go-league branch October 17, 2023 05:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
algorithm New algorithm enhancement New feature or request environment New or improved environment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants