-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature(yzj): add multi-agent and structured observation env (GoBigger) #39
base: main
Are you sure you want to change the base?
Conversation
@@ -34,6 +36,7 @@ def __init__( | |||
discrete_action_encoding_type: str = 'one_hot', | |||
norm_type: Optional[str] = 'BN', | |||
res_connection_in_dynamics: bool = False, | |||
state_encoder=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
增加state_encoder的Type Hints以及相应的arguments注释
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://aicarrier.feishu.cn/wiki/N4bqwLRO5iyQcAkb4HCcflbgnpR 可以参考这里的提示词优化注释哈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
beg_index = observation_shape * step_i | ||
end_index = observation_shape * (step_i + self._cfg.model.frame_stack_num) | ||
obs_target_batch_new[k] = v[:, beg_index:end_index] | ||
network_output = self._learn_model.initial_inference(obs_target_batch_new) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
上面对结构化观察的处理或许可以抽象为一个函数
zoo/petting_zoo/model/model.py
Outdated
self.encoder = FCEncoder(obs_shape=18, hidden_size_list=[256, 256], activation=nn.ReLU(), norm_type=None) | ||
|
||
def forward(self, x): | ||
x = x['agent_state'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
增加注释,为什么是agent_state,x中包含哪些key,每一项的含义是什么
from pettingzoo.mpe._mpe_utils.simple_env import SimpleEnv, make_env | ||
from pettingzoo.mpe.simple_spread.simple_spread import Scenario | ||
from PIL import Image | ||
import pygame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimize import
tmp[k] = v[i] | ||
tmp['action_mask'] = [1 for _ in range(*self._action_dim)] | ||
ret_transform.append(tmp) | ||
return {'observation': ret_transform, 'action_mask': action_mask, 'to_play': to_play} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
关于'observation'的详细注释加在_process_obs()方法的overview中
last_game_priorities = [[None for _ in range(agent_num)] for _ in range(env_nums)] | ||
# for priorities in self-play | ||
search_values_lst = [[[] for _ in range(agent_num)] for _ in range(env_nums)] | ||
pred_values_lst = [[[] for _ in range(agent_num)] for _ in range(env_nums)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这样出现多次的代码段,或许可以抽象为class的一个工具函数
zoo/petting_zoo/config/__init__.py
Outdated
@@ -0,0 +1 @@ | |||
from .ptz_simple_spread_ez_config import main_config, create_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
所有lz中的petting_zoo换成pettingzoo或许更加简洁
@@ -44,6 +46,8 @@ def __init__(self, cfg: dict): | |||
self.base_idx = 0 | |||
self.clear_time = 0 | |||
|
|||
self.tmp_obs = None # for value obs list [46 + 4(td_step)] not < 50(game_segment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
优化注释,注释尽量完整清晰
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
m_obs = value_obs_list[beg_index:end_index] | ||
m_obs = sum(m_obs, []) | ||
m_obs = default_collate(m_obs) | ||
m_obs = to_device(m_obs, self._cfg.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
抽象为一个数据处理函数,放在utils中?
@@ -34,6 +36,7 @@ def __init__( | |||
discrete_action_encoding_type: str = 'one_hot', | |||
norm_type: Optional[str] = 'BN', | |||
res_connection_in_dynamics: bool = False, | |||
state_encoder=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://aicarrier.feishu.cn/wiki/N4bqwLRO5iyQcAkb4HCcflbgnpR 可以参考这里的提示词优化注释哈
""" | ||
Overview: | ||
The policy class for Multi Agent EfficientZero. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
说明目前的Multi Agent算法与单agent算法的区别,概述一下目前的indepent learning的实现方式。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
) | ||
# NOTE: Convert the ``action_index_in_legal_action_set`` to the corresponding ``action`` in the entire action set. | ||
action = np.where(action_mask[i] == 1.0)[0][action_index_in_legal_action_set] | ||
output[i // agent_num]['action'].append(action) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
增加注释
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
""" | ||
Overview: | ||
The policy class for Multi Agent MuZero. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
from ding.utils import ENV_REGISTRY, deep_merge_dicts | ||
import math | ||
from easydict import EasyDict | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加一下GoBigger原来仓库的链接,以及这里与其的区别吧?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try except中加了链接
|
||
main_config = dict( | ||
exp_name= | ||
f'data_mz_ctree/{env_name}_muzero_ns{num_simulations}_upc{update_per_collect}_rr{reanalyze_ratio}_seed{seed}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前这里的ptz_simple_spread_mz性能是如何的呀?如果不太好,先把ptz相关的去掉吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
max_env_step: Optional[int] = int(1e10), | ||
) -> 'Policy': # noqa | ||
""" | ||
Overview: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
之前为什么需要为ptz单独写entry呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为需要单独传encoder
lzero/entry/train_muzero.py
Outdated
@@ -47,12 +47,12 @@ def train_muzero( | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
合并一下main分支,将mz ez的相关基线结果加在PR的description里面。然后优化好后新建一个分支 multi-agent, push到opendilab/lightzero 上去,在这个PR后面写一下,最新的稳定代码放在了 multi-agent 这个分支上面。
No description provided.