Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(yzj): add multi-agent and structured observation env (GoBigger) #39

Open
wants to merge 59 commits into
base: main
Choose a base branch
from

Conversation

jayyoung0802
Copy link
Collaborator

No description provided.

@puyuan1996 puyuan1996 self-assigned this Jun 1, 2023
@puyuan1996 puyuan1996 added the enhancement New feature or request label Jun 1, 2023
lzero/mcts/tree_search/mcts_ptree_sampled.py Outdated Show resolved Hide resolved
lzero/model/gobigger/network/activation.py Outdated Show resolved Hide resolved
lzero/model/gobigger/network/res_block.py Outdated Show resolved Hide resolved
lzero/policy/gobigger_muzero.py Outdated Show resolved Hide resolved
zoo/gobigger/config/gobigger_muzero_config.py Outdated Show resolved Hide resolved
lzero/worker/gobigger_muzero_collector.py Outdated Show resolved Hide resolved
lzero/worker/gobigger_muzero_collector.py Outdated Show resolved Hide resolved
lzero/model/gobigger/gobigger_muzero_model.py Outdated Show resolved Hide resolved
lzero/mcts/buffer/gobigger_game_buffer_muzero.py Outdated Show resolved Hide resolved
lzero/entry/train_muzero_gobigger.py Outdated Show resolved Hide resolved
lzero/entry/eval_muzero_gobigger.py Outdated Show resolved Hide resolved
lzero/entry/utils.py Outdated Show resolved Hide resolved
lzero/mcts/buffer/gobigger_game_buffer_efficientzero.py Outdated Show resolved Hide resolved
lzero/mcts/buffer/gobigger_game_buffer_efficientzero.py Outdated Show resolved Hide resolved
lzero/mcts/buffer/gobigger_game_buffer_muzero.py Outdated Show resolved Hide resolved
lzero/model/gobigger/network/gobigger_encoder.py Outdated Show resolved Hide resolved
lzero/policy/gobigger_random_policy.py Outdated Show resolved Hide resolved
lzero/policy/gobigger_random_policy.py Outdated Show resolved Hide resolved
lzero/worker/gobigger_muzero_collector.py Outdated Show resolved Hide resolved
zoo/gobigger/config/gobigger_eval_config.py Outdated Show resolved Hide resolved
lzero/entry/__init__.py Outdated Show resolved Hide resolved
@@ -34,6 +36,7 @@ def __init__(
discrete_action_encoding_type: str = 'one_hot',
norm_type: Optional[str] = 'BN',
res_connection_in_dynamics: bool = False,
state_encoder=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

增加state_encoder的Type Hints以及相应的arguments注释

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://aicarrier.feishu.cn/wiki/N4bqwLRO5iyQcAkb4HCcflbgnpR 可以参考这里的提示词优化注释哈

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

beg_index = observation_shape * step_i
end_index = observation_shape * (step_i + self._cfg.model.frame_stack_num)
obs_target_batch_new[k] = v[:, beg_index:end_index]
network_output = self._learn_model.initial_inference(obs_target_batch_new)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上面对结构化观察的处理或许可以抽象为一个函数

self.encoder = FCEncoder(obs_shape=18, hidden_size_list=[256, 256], activation=nn.ReLU(), norm_type=None)

def forward(self, x):
x = x['agent_state']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

增加注释,为什么是agent_state,x中包含哪些key,每一项的含义是什么

from pettingzoo.mpe._mpe_utils.simple_env import SimpleEnv, make_env
from pettingzoo.mpe.simple_spread.simple_spread import Scenario
from PIL import Image
import pygame
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optimize import

tmp[k] = v[i]
tmp['action_mask'] = [1 for _ in range(*self._action_dim)]
ret_transform.append(tmp)
return {'observation': ret_transform, 'action_mask': action_mask, 'to_play': to_play}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

关于'observation'的详细注释加在_process_obs()方法的overview中

last_game_priorities = [[None for _ in range(agent_num)] for _ in range(env_nums)]
# for priorities in self-play
search_values_lst = [[[] for _ in range(agent_num)] for _ in range(env_nums)]
pred_values_lst = [[[] for _ in range(agent_num)] for _ in range(env_nums)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这样出现多次的代码段,或许可以抽象为class的一个工具函数

@@ -0,0 +1 @@
from .ptz_simple_spread_ez_config import main_config, create_config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

所有lz中的petting_zoo换成pettingzoo或许更加简洁

@@ -44,6 +46,8 @@ def __init__(self, cfg: dict):
self.base_idx = 0
self.clear_time = 0

self.tmp_obs = None # for value obs list [46 + 4(td_step)] not < 50(game_segment)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

优化注释,注释尽量完整清晰

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

m_obs = value_obs_list[beg_index:end_index]
m_obs = sum(m_obs, [])
m_obs = default_collate(m_obs)
m_obs = to_device(m_obs, self._cfg.device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

抽象为一个数据处理函数,放在utils中?

@@ -34,6 +36,7 @@ def __init__(
discrete_action_encoding_type: str = 'one_hot',
norm_type: Optional[str] = 'BN',
res_connection_in_dynamics: bool = False,
state_encoder=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://aicarrier.feishu.cn/wiki/N4bqwLRO5iyQcAkb4HCcflbgnpR 可以参考这里的提示词优化注释哈

"""
Overview:
The policy class for Multi Agent EfficientZero.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

说明目前的Multi Agent算法与单agent算法的区别,概述一下目前的indepent learning的实现方式。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

)
# NOTE: Convert the ``action_index_in_legal_action_set`` to the corresponding ``action`` in the entire action set.
action = np.where(action_mask[i] == 1.0)[0][action_index_in_legal_action_set]
output[i // agent_num]['action'].append(action)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

增加注释

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"""
Overview:
The policy class for Multi Agent MuZero.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

zoo/gobigger/config/gobigger_eval_config.py Show resolved Hide resolved
from ding.utils import ENV_REGISTRY, deep_merge_dicts
import math
from easydict import EasyDict
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加一下GoBigger原来仓库的链接,以及这里与其的区别吧?

Copy link
Collaborator Author

@jayyoung0802 jayyoung0802 Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try except中加了链接


main_config = dict(
exp_name=
f'data_mz_ctree/{env_name}_muzero_ns{num_simulations}_upc{update_per_collect}_rr{reanalyze_ratio}_seed{seed}',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前这里的ptz_simple_spread_mz性能是如何的呀?如果不太好,先把ptz相关的去掉吧

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

max_env_step: Optional[int] = int(1e10),
) -> 'Policy': # noqa
"""
Overview:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之前为什么需要为ptz单独写entry呢?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为需要单独传encoder

@@ -47,12 +47,12 @@ def train_muzero(
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

合并一下main分支,将mz ez的相关基线结果加在PR的description里面。然后优化好后新建一个分支 multi-agent, push到opendilab/lightzero 上去,在这个PR后面写一下,最新的稳定代码放在了 multi-agent 这个分支上面。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request environment New or improved environment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants