[Feature Request] Domain Randomization #180

KonstantinRamthun · 2023-05-07T11:39:19Z

🚀 Feature

Literature suggests different techniques for domain randomization. This includes:

These are mostly independent of the RL algorithm used to train the policy. Thus, they could be implemented as callbacks in SB3.

Motivation

When using RL for continuous control tasks the motivation is often a more robust and general controller/agent. Domain Randomization is one technique for achieving this goal. Having different techniques available may help others to compare what works best for their environments.

Pitch

I suggest implementing a base domain randomization callback. This allows interacting with the environments at each reset by setting domain randomization/reset parameters in the environment. Environments have to be controller through e.g., an adapter to support this interaction. Users of the callback are responsible for setting the parameters in their reset implementations.

Individual domain randomization techniques inherit from the base callback and provide their corresponding functionality. Some techniques like Automatic Domain Randomization may need an additional evaluation callback to adapt their parameter space.

Additionally, one might want to add an adapted version of the EvalCallback to allow evaluating environments with a predetermined and constant set of parameters. I don't know if this is possible with the current EvalCallback.

For the start I would implement simpler techniques like Unifrom Domain Randomization and Automatic Domain Randomization.

Alternatives

No response

Additional context

What do you think of this suggestion? If you find this a suitable extension to this repo, I could implement it.

Checklist

I have checked that there is no similar issue in the repo

araffin · 2023-05-11T19:56:44Z

Hello,
thanks for the suggestion, it is true that domain randomization is independent of the RL algorithm but in my mind, domain randomization is highly dependent on the environment, so it would be hard to provide a common callbacks that works for many.

I would also rather implement that on the environment side (so more a Gym/Gymnasium).
Or are you proposing something different/a common interface that could be re-used adapted?
if so, do you have working proof of concept that you can share?

KonstantinRamthun · 2023-05-14T10:06:19Z

I think you can't implement all DR approaches in the environments alone. For e.g., Automatic Domain Randomization, you need additional evaluation episodes. Thus, I see DR more as an extension to RL algorithm than something related to the environments.

I tought of using an extension to the gym interface with reset parameters, which are set by the DR at each reset and used in the reset method. I'll implement a PoC and get back to you.

KonstantinRamthun added the enhancement New feature or request label May 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Domain Randomization #180

[Feature Request] Domain Randomization #180

KonstantinRamthun commented May 7, 2023

araffin commented May 11, 2023

KonstantinRamthun commented May 14, 2023

[Feature Request] Domain Randomization #180

[Feature Request] Domain Randomization #180

Comments

KonstantinRamthun commented May 7, 2023

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

Checklist

araffin commented May 11, 2023

KonstantinRamthun commented May 14, 2023