Skip to content

In this project, we use the maximum entropy principle in Inverse reinforcement learning to learn soft constraints from demonstrations obtained from an agent interacting with a non-deterministic MDP. In the second part of this project, we implement various strategies (orchestrators) to mix conflicting policies (e.g. pragmatic vs ethical). In one …

License

Notifications You must be signed in to change notification settings

Rahgooy/soft_constraint_irl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Max Entropy Soft Constraint Inverse Reinforcement Learning

You can find some examples from the grid-world in the notebooks folder.

Installing the requirements

  pip install -r requirements.txt

Running the experiments

Hard constraints

To learn the constraints run:

  python -m max_ent.examples.learn_hard_constraints

After learning, run the following to generate the reports (in ./reports/hard/ folder):

  python -m max_ent.examples.compare_hard_results

Soft constraints

To learn the constraints run:

  python -m max_ent.examples.learn_soft_constraints

After learning, run the following to generate the reports (in ./reports/soft/ folder):

  python -m max_ent.examples.compare_soft_results

Transfer Learning

To run the transfer learning experiments and generate the results use:

  python -m max_ent.examples.transfer

The generated reports can be found in ./reports/transfer/ folder.

Orchestration

Run the notebook in ./notebooks/new_metrics.ipynb .

Also, you can set learn = True in ./max_ent/examples/orchestrator_exp.py then run:

  python -m max_ent.examples.orchestrator_exp

After that, set learn = False and run the above command again. The reports will be generated into ./reports/orchestrator/ folder.

Acknowledgement

This repository uses and modifies some codes from irl-maxent library.

About

In this project, we use the maximum entropy principle in Inverse reinforcement learning to learn soft constraints from demonstrations obtained from an agent interacting with a non-deterministic MDP. In the second part of this project, we implement various strategies (orchestrators) to mix conflicting policies (e.g. pragmatic vs ethical). In one …

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages