Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch Categorical GRU Policy #2196

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ManavR123
Copy link

@ManavR123 ManavR123 commented Dec 9, 2020

In this PR, I add a Pytorch version of the Categorical GRU Policy. I believe there is some in-progress work with adding RNN support #2172, however, this PR seems to be in draft for a while and I personally needed this policy so I figured I would try and contribute. One thing missing would be an example using this policy, but it isn't too different from other similar policies.

I would appreciate any feedback on this!

@krzentner @avnishn

@ManavR123 ManavR123 requested a review from a team as a code owner December 9, 2020 08:45
@ManavR123 ManavR123 requested review from ahtsan and removed request for a team December 9, 2020 08:45
@mergify mergify bot requested review from a team, gitanshu and nicolengsy and removed request for a team December 9, 2020 08:46
Copy link
Member

@avnishn avnishn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good to me.

One thing that the tests are missing from the tensor flow tests is the tests that build an input that concatenates the action and the observation.

Tbf I'm not sure why those exist, and what use case they're testing. I think that @krzentner can probably shed some light on this.

Otherwise this is pretty awesome and we'll wait on krs comments to address this. I think she and another student have been working on adding pytorch rnn support for the past few weeks and this may be helpful to them.

@mergify mergify bot requested a review from a team December 9, 2020 18:09
@avnishn
Copy link
Member

avnishn commented Dec 9, 2020

Oh wait one more thing @ManavR123.

You'll want to include a test similar to:

tests/garage/tf/models/test_categorical_gru_model.py

The test titled test normalized outputs checks to ensure that your module outputs the correct values in a golden values test.

This will ensure correctness.

You can also obtain these golden values by using your module/policy in some benchmark to ensure that it works, then running the policy and module on a simplified environment like grid world env, capturing the weights of that policy/module, and then writing a test that compares the values of a trained policy to those golden values after a few optimization iterations.

Does that make sense? Please ask me anything if that wasn't clear!

Thanks
@avnishn

Copy link
Member

@avnishn avnishn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reviewed the test you wrote. I think there's a small bug see my comments and lmk what you think.

tests/garage/torch/modules/test_categorical_gru_module.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team December 9, 2020 21:22
@krzentner
Copy link
Contributor

Wow, this PR looks good. The main feedback I have is that we don't actually yet have any algorithms implemented in PyTorch that can train an RNN. Unfortunately, our VPG (and PPO and TRPO) shuffle all timesteps individually in their batching step, which prevents training the RNN correctly. I'm currently working on fixing that, but probably won't have anything usable available before sometime next week.

The more minor feedback I have is that I think we should actually streamline the garage/torch/modules directory to not include modules that output distributions whenever possible, since it's just an extra layer to reason about when working with policies (which themselves have a super-set of the API of the equivalent module).

@ryanjulian
Copy link
Member

@krzentner I'm curious: what is your implementation plan for RNNs in torch/VPG? Is it to make the optimization trajectory-oriented (as in tf/VPG) or something else?

@nikhilxb
Copy link

nikhilxb commented Sep 9, 2021

Wow, this PR looks good. The main feedback I have is that we don't actually yet have any algorithms implemented in PyTorch that can train an RNN. Unfortunately, our VPG (and PPO and TRPO) shuffle all timesteps individually in their batching step, which prevents training the RNN correctly. I'm currently working on fixing that, but probably won't have anything usable available before sometime next week.

The more minor feedback I have is that I think we should actually streamline the garage/torch/modules directory to not include modules that output distributions whenever possible, since it's just an extra layer to reason about when working with policies (which themselves have a super-set of the API of the equivalent module).

@krzentner Did this end up getting done? I'm also trying to implement an RNN policy in PyTorch but can't seem to find any working examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants