-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DQN: Action mask is not compatible in vectorized environments #186
Comments
Hey, how did you vectorize the environment? PZ doesn't offer any wrapper for vectorizing envs. We have one for parallel-API PZ envs. Could make a good contribution to the framework! In terms of the action mask not working for vectorized envs, this is because it wasn't originally designed to, but with more clarity on how your vectorization works we can easily implement it I'm sure |
Hi! I referenced your wrapper for PZ Parallel Env and implemented a wrapper for AEC API. It might not be fully vectorized because it waits until episodes in all environments are terminated or truncated (done) to start a new set of episodes. The main changes are:
|
hey @nargizsentience, is there a particular reason for the environments not autoresetting themselves? I remember writing the parallelization with keeping autoresetting in mind so this is a personal question. with your description, I don't think amending the getAction would be too hard and make it vectorization compatible. |
What version of AgileRL are you using?
v0.1.19
What operating system and processor architecture are you using?
Windows, 64-bit operating system, x64-based processor
What did you do?
I attempted to add vectorization to the self-play script to train DQN agent in PettingZoo AEC env. However, it seems like DQN's getAction assumes the usage of single action mask for all environments. It results in the mismatch between the shapes of mask and data fed into
np.ma.array
Steps to reproduce the behaviour:
What did you expect to see?
A list of actions
[1, 0]
. Each action corresponds to a respective action mask and state.What did you see instead? Describe the bug.
numpy.ma.core.MaskError: Mask and data not compatible: data size is 2, mask size is 4.
Additional context
The current getAction() seems to assume that
action_mask
is an 1D array, the size of which corresponds to theaction_dim
. It then samplesn
actions, wheren
is the number of observations (state.size()[0]
). However, when the 'action_mask' is not an 1D array, the mask shape does not have the same shape asnp.arange(0, self.action_dim)
.I fixed this issue locally by modifying the getAction().
action_mask.ndim == 1
.The text was updated successfully, but these errors were encountered: