-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Domain Randomization #180
Comments
Hello, I would also rather implement that on the environment side (so more a Gym/Gymnasium). |
I think you can't implement all DR approaches in the environments alone. For e.g., Automatic Domain Randomization, you need additional evaluation episodes. Thus, I see DR more as an extension to RL algorithm than something related to the environments. I tought of using an extension to the gym interface with reset parameters, which are set by the DR at each reset and used in the reset method. I'll implement a PoC and get back to you. |
馃殌 Feature
Literature suggests different techniques for domain randomization. This includes:
These are mostly independent of the RL algorithm used to train the policy. Thus, they could be implemented as callbacks in SB3.
Motivation
When using RL for continuous control tasks the motivation is often a more robust and general controller/agent. Domain Randomization is one technique for achieving this goal. Having different techniques available may help others to compare what works best for their environments.
Pitch
I suggest implementing a base domain randomization callback. This allows interacting with the environments at each reset by setting domain randomization/reset parameters in the environment. Environments have to be controller through e.g., an adapter to support this interaction. Users of the callback are responsible for setting the parameters in their reset implementations.
Individual domain randomization techniques inherit from the base callback and provide their corresponding functionality. Some techniques like Automatic Domain Randomization may need an additional evaluation callback to adapt their parameter space.
Additionally, one might want to add an adapted version of the EvalCallback to allow evaluating environments with a predetermined and constant set of parameters. I don't know if this is possible with the current EvalCallback.
For the start I would implement simpler techniques like Unifrom Domain Randomization and Automatic Domain Randomization.
Alternatives
No response
Additional context
What do you think of this suggestion? If you find this a suitable extension to this repo, I could implement it.
Checklist
The text was updated successfully, but these errors were encountered: