-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to solve reward dropping after reaching super humain level #97
Comments
|
algo: sampled_efficientzero this is my config: image_channel=7 collector_env_num = 16 data_sampled_efficientzero_config = dict( |
Hello, Here are some modification recommendations to your configuration file, mainly focusing on the following aspects:
collector_env_num = 8
n_episode = 8
evaluator_env_num = 5
num_simulations = 50
update_per_collect = 200
replay_buffer_size=int(1e6),
game_segment_length=400, # TODO: adjust according to your episode length
These optimization suggestions aim to enhance the model's performance while maintaining a balance in efficiency and memory usage. I hope you find these recommendations helpful. |
(7, 9, 9) mean 7 images with size of 9x9 , i also found some problem with that because i should declare it as (7, 9, 9) and feed it to model as (9, 9, 7) this the only way i got it to work i apply this code to change the shape without affecting the images:
is this correct or i made a mistake ? also what about the neural network size and also the hidden layers , i think is also important to be able to handle more data ? or im wrong ? thank you so much @puyuan1996 |
Hello,
def restack(self, gaf_images):
"""
Restack the images along the last dimension.
Args:
gaf_images (np.array): array of images with shape (7, 9, 9).
Returns:
image_tensor (np.array): reshaped array of images with shape (9, 9, 7).
"""
image_tensor = np.transpose(gaf_images, (1, 2, 0))
return image_tensor This function will transpose the tensor from shape (7, 9, 9) to (9, 9, 7). However, for our implementation of the MuZero algorithm, the input to a conv type model should indeed be in the form of images with a shape like (7,9,9). In this case, the first dimension represents the number of channels, while the following two dimensions correspond to the width and height of the image, respectively. You may refer to the existing Atari MuZero configuration as an example.
Best wishes for your experiments. |
Hello, Mr. @puyuan1996! I want to express my sincere gratitude for your kindness, and I must say that this repository is truly an astonishing work of AI art. Your effort and dedication shine brightly in this project, and it's genuinely commendable. Great job! im trying to teach ai to ebserve only and no take action on an expiration time to get reward then will be able to take an other action is this possible ? i think about this this parammetres ? but im not sure please can you guide me
i try : i also try: |
Hello,
Regarding your question about the special environment's MDP:
Regarding your question about
Best Wishes. |
Hello, Mr. @puyuan1996 , thank you so much for your help and kindness, i notice that ckpt_best.pth.tar not save every new best evaluation on trainning , what the factor is take to decide save ckpt_best.pth.tar , because is like save 1 to 3 times and not more even reach multi more better points , sometimes save ! i not clearly understand the factors or parrameter that control it also im still have sometimes spikes on my gpu and memory limitation , memory should just not feed big resolution data this work im really wonder about way ckpt_best.pth.tar not save , my last trainning is save only on first time even is going learning. also i have error on eval after finish , this returns is list of None: [None, None, None, ...] |
Hello, Regarding the storage frequency of model checkpoints (ckpt), LightZero's underlying implementation is based on DI-engine, which uses a hook mechanism to save the model's checkpoints. You can refer to the test file for more details. You can adjust the following settings under the policy=dict(
...
learn=dict(
learner=dict(
hook=dict(
save_ckpt_after_iter=200,
save_ckpt_after_run=True,
log_show_after_iter=100,
),
),
),
...
), In this configuration:
Regarding the return value error of Good luck! |
how to solve reward dropping after reaching super humain level , or how to save model on this top level , before its start dropping
The text was updated successfully, but these errors were encountered: