Hierarchical policies #253

jordis-ai2 · 2021-02-17T13:27:59Z

I provide an example in minigrid of how it can be used. I can see the interface being too hard to use, but I think it's good to iterate after we've all taken a look.

Lucaweihs

This review got a bit away from me and feels a bit jumbled: the comments range from minor low-level naming suggestions to high-level thoughts about API consistency. Sorry about that.

Overall, I think this direction is very cool and we definitely need something like your DirectedGraphicalModel class. That said I'm a bit torn:

As we do more of this directed graphical model stuff it feels a bit like we're heading down the path of reinventing tensorflow, i.e. defining a static computation graph, defining inputs, and then running it. Even with the best engineering, this gets quite complicated.
At the same time, this DirectedGraphicalModel class is (correct me if I'm wrong) just the most general means a user has to enable this hierarchical functionality. The other approach would be for them to define their own Distr subclass which just (effectively) hard-codes all of the inputs, outputs, and conditional distributions. Given this, maybe it doesn't matter that the DirectedGraphicalModel approach is a bit complex, people can choose to go another direction if that's what they prefer.

It might be nice to hear @jiasenlu thoughts as the person would would likely be the first "customer" of this.

allenact/algorithms/onpolicy_sync/engine.py

Lucaweihs · 2021-04-07T20:37:56Z

allenact/algorithms/onpolicy_sync/engine.py

+    def act(self, rollouts: RolloutStorage, dist_wrapper_class: Optional[type] = None):
+        dist_wrapper_class = None
+        if self.training_pipeline.current_stage.teacher_forcing is not None:
+            dist_wrapper_class = TeacherForcingDistr


I really like how this TeacherForcingDistr streamlines so much of the below code. I'm still going through everything but my one worry is that it might make calls to .log_prob inconsistent among Distr classes:

E.g. let's say the teacher forcing prob is 0.1, the expert action is 0, and the agent's distribution is [0.25, 0.25, 0.25, 0.25] for the actions [0, 1, 2, 3]. Then the probability that I sample the 0 action is

0.1 * 1 + 0.9 * 0.25 = 0.325

and so if I call teacher_forcing_distr.prob(0) it should equal 0.325 (to be consistent with all other uses of .prob). But when we're doing IL I don't really care about the prob of the teacher_forcing_distr, I care only about the 0.25 prob of the model.

Yes, I see. Currently (if I'm not wrong), I'm still outputting the 0.25 of the model for the chosen action while enforcing actions during sampling (so TeacherForcingDistr does not fully behave as a distribution). Maybe we want to call it by a different name? It's used as a distribution, but maybe we need a superclass for Distr called Sampleable/Sampler, which can specialize as a Distr (with all its expected properties) or a Teacher, which kind of hacks some details since it's no longer a Distr?

allenact/base_abstractions/distributions.py

…engine in TeacherForcingDistr.

lgtm-com · 2021-04-29T02:30:27Z

This pull request introduces 3 alerts when merging 2d5fd70 into 1882ff1 - view on LGTM.com

new alerts:

3 for Unused import

…istr

Lucaweihs

Looking much more streamlined. This is definitely a big update so I've added a bunch of additional comments. To me it seems like the one missing thing is the ability to not pass all actions to the task. Generally I suspect that, when using hierarchical models, most of the actions are going to be "internal", i.e. they're needed to chose the final action but don't actually mean anything to the task. This might happen if, for example, the top-level agent's action space was [f"search_for_{x}" for x in object_types] and the lower-level agent's actions were the movement actions.

allenact/algorithms/onpolicy_sync/engine.py

allenact/base_abstractions/sensor.py

allenact/embodiedai/sensors/vision_sensors.py

- Old Vision Sensors tombstones + capitalized constants in ExpertSensor - `self.num_active_samplers` from vector tasks in engine - `dist_wrapper_class` argument in engine's `act` - Explicit call to `condition_on_input` in `ConditionalDistr` - Bug fix in ConditionedMiniGridTask stub

…classing abstract expert sensor

…ake it compatible with cross_entropy implementation in imitation

…mitation

…entropy() in ppo loss)

…licies

jordis-ai2 · 2021-05-20T22:53:45Z

I went ahead and added a new test (using the MiniGrid hacks I used while developing hierarchical policy support) + command line option to collect validation results + command line option to enforce expert during inference

Lucaweihs

Looks great. I left a few comments here and there, mostly asking for some additional documentation in places. As the hierarchical code is pretty involved it would be be great to have a tutorial that really went through all of the details, we could even reference this tutorial in the code so that people could learn things by example.

Lucaweihs · 2021-05-21T16:40:51Z

allenact/algorithms/onpolicy_sync/engine.py

@@ -1416,6 +1374,7 @@ def __init__(
        worker_id: int = 0,
        num_workers: int = 1,
        distributed_port: int = 0,
+        enforce_expert: bool = False,


I'd prefer that we didn't offer this option. I know it's off by default but I want to make it as hard as possible to accidentally run the expert in the inference phase (especially as the tensorboard log would associate these results with the tag from config file). Perhaps we should add an option to run the expert as in a "stand-alone" mode (with its own tag) where we don't even require the "ExpertConfig" to have defined a model.

I see what you mean. I still think this is worth it, though. For example, at some development point in Jiasen's project, we wanted to check what happens when we enable the expert for only one/some of the skills in a hierarchical policy. For that, we needed to run the expert and the model during inference. Let me know your thoughts.

allenact/algorithms/onpolicy_sync/engine.py

allenact/algorithms/onpolicy_sync/losses/a2cacktr.py

allenact/algorithms/onpolicy_sync/losses/imitation.py

allenact/algorithms/onpolicy_sync/runner.py

allenact/base_abstractions/sensor.py

allenact/base_abstractions/task.py

allenact/main.py

projects/tutorials/minigrid_tutorial_conds.py

jordis-ai2 added 5 commits February 12, 2021 18:05

Splitting apart vision sensors, adding action groups to expert sensor

f970aaf

Unflattened expert sensor

1b6fcb8

CondDistr, DirectedGraphicalModel and TeacherForcingDistr

598e394

Generic RNNActorCritic

b94a389

Example hierarchical policy with teacher forcing

1e9a82d

jordis-ai2 requested review from Lucaweihs, jiasenlu and roozbehm February 17, 2021 13:27

jordis-ai2 added 7 commits February 17, 2021 09:05

Merge branch 'master' into hierarchical-policies

d1edd2e

Fix action space for non-directed graphical model distrs

0bd5ac1

Merge branch 'master' into hierarchical-policies

e8e8ad3

Merge branch 'master' into hierarchical-policies

84ee09a

Merge branch 'master' into hierarchical-policies

fee7187

Merge branch 'master' into hierarchical-policies

e46eace

Merge branch 'master' into hierarchical-policies

05f0f1d

Lucaweihs reviewed Apr 7, 2021

View reviewed changes

jordis-ai2 added 3 commits April 15, 2021 12:34

Merge branch 'main' into hierarchical-policies

98238a5

Merge branch 'main' into hierarchical-policies

69fe476

Replaced DirectedGraphicalModel by SequentialDistr. No references to …

2d5fd70

…engine in TeacherForcingDistr.

jordis-ai2 added 4 commits April 29, 2021 09:33

addressing lgtm.com's alerts

ac82bef

More detailed feedback about entropy estimation method in SequentialD…

879acc8

…istr

Style edit in distributions

66ad724

Merge branch 'main' into hierarchical-policies

1cbaf2e

Lucaweihs reviewed Apr 29, 2021

View reviewed changes

jordis-ai2 added 5 commits May 7, 2021 14:24

Merge branch 'main' into hierarchical-policies

fe4bd87

Merge branch 'main' into hierarchical-policies

d4b446a

Updated doc strings for vision sensors + immutable defaults

f3a34b9

query_expert member of ExpertSensor. Expert policy/action sensors sub…

6c21ed3

…classing abstract expert sensor

jordis-ai2 added 4 commits May 11, 2021 12:56

Custom box action_space to policy_space

586e941

Hierarchical policy support for expert actions

d7b8d01

Re-enabling expert policy imitation + changed policy_space shape to m…

7b82d37

…ake it compatible with cross_entropy implementation in imitation

Added masking with different number of dimensions for expert policy i…

1bb2c34

…mitation

jordis-ai2 marked this pull request as ready for review May 12, 2021 01:03

jordis-ai2 added 5 commits May 20, 2021 09:49

Merge branch 'main' into hierarchical-policies

041f14b

Bug fixes (wrong assertion in imitation loss, spurious call to distr.…

0da6ad1

…entropy() in ppo loss)

Minigrid with Conditional Distribution moved to tests/hierarchical_po…

24c339d

…licies

Added inference expert + validation result collection options

60df969

Improved test for MiniGridConditional

0830546

Fixed indentation for valid metrics output checks+info

de9d3ce

Lucaweihs reviewed May 21, 2021

View reviewed changes

jordis-ai2 added 2 commits June 4, 2021 15:55

Merge branch 'main' into hierarchical-policies

a04c576

Addressing comments in PR

b6c4a2f

Lucaweihs mentioned this pull request Jun 10, 2021

Enable to log metrics regardless of train/val/test #292

Open

Lucaweihs merged commit ca82aca into main Jun 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hierarchical policies #253

Hierarchical policies #253

jordis-ai2 commented Feb 17, 2021

Lucaweihs left a comment

Lucaweihs Apr 7, 2021

jordis-ai2 Apr 15, 2021 •

edited

Loading

lgtm-com bot commented Apr 29, 2021

Lucaweihs left a comment

jordis-ai2 commented May 20, 2021

Lucaweihs left a comment

Lucaweihs May 21, 2021

jordis-ai2 Jun 7, 2021

Hierarchical policies #253

Hierarchical policies #253

Conversation

jordis-ai2 commented Feb 17, 2021

Lucaweihs left a comment

Choose a reason for hiding this comment

Lucaweihs Apr 7, 2021

Choose a reason for hiding this comment

jordis-ai2 Apr 15, 2021 • edited Loading

Choose a reason for hiding this comment

lgtm-com bot commented Apr 29, 2021

Lucaweihs left a comment

Choose a reason for hiding this comment

jordis-ai2 commented May 20, 2021

Lucaweihs left a comment

Choose a reason for hiding this comment

Lucaweihs May 21, 2021

Choose a reason for hiding this comment

jordis-ai2 Jun 7, 2021

Choose a reason for hiding this comment

jordis-ai2 Apr 15, 2021 •

edited

Loading