Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hierarchical policies #253

Merged
merged 36 commits into from
Jun 15, 2021
Merged

Hierarchical policies #253

merged 36 commits into from
Jun 15, 2021

Conversation

jordis-ai2
Copy link
Collaborator

I provide an example in minigrid of how it can be used. I can see the interface being too hard to use, but I think it's good to iterate after we've all taken a look.

Copy link
Collaborator

@Lucaweihs Lucaweihs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review got a bit away from me and feels a bit jumbled: the comments range from minor low-level naming suggestions to high-level thoughts about API consistency. Sorry about that.

Overall, I think this direction is very cool and we definitely need something like your DirectedGraphicalModel class. That said I'm a bit torn:

  • As we do more of this directed graphical model stuff it feels a bit like we're heading down the path of reinventing tensorflow, i.e. defining a static computation graph, defining inputs, and then running it. Even with the best engineering, this gets quite complicated.
  • At the same time, this DirectedGraphicalModel class is (correct me if I'm wrong) just the most general means a user has to enable this hierarchical functionality. The other approach would be for them to define their own Distr subclass which just (effectively) hard-codes all of the inputs, outputs, and conditional distributions. Given this, maybe it doesn't matter that the DirectedGraphicalModel approach is a bit complex, people can choose to go another direction if that's what they prefer.

It might be nice to hear @jiasenlu thoughts as the person would would likely be the first "customer" of this.

allenact/algorithms/onpolicy_sync/engine.py Outdated Show resolved Hide resolved
allenact/algorithms/onpolicy_sync/engine.py Outdated Show resolved Hide resolved
allenact/algorithms/onpolicy_sync/engine.py Outdated Show resolved Hide resolved
def act(self, rollouts: RolloutStorage, dist_wrapper_class: Optional[type] = None):
dist_wrapper_class = None
if self.training_pipeline.current_stage.teacher_forcing is not None:
dist_wrapper_class = TeacherForcingDistr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like how this TeacherForcingDistr streamlines so much of the below code. I'm still going through everything but my one worry is that it might make calls to .log_prob inconsistent among Distr classes:

E.g. let's say the teacher forcing prob is 0.1, the expert action is 0, and the agent's distribution is [0.25, 0.25, 0.25, 0.25] for the actions [0, 1, 2, 3]. Then the probability that I sample the 0 action is

0.1 * 1 + 0.9 * 0.25 = 0.325

and so if I call teacher_forcing_distr.prob(0) it should equal 0.325 (to be consistent with all other uses of .prob). But when we're doing IL I don't really care about the prob of the teacher_forcing_distr, I care only about the 0.25 prob of the model.

Copy link
Collaborator Author

@jordis-ai2 jordis-ai2 Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see. Currently (if I'm not wrong), I'm still outputting the 0.25 of the model for the chosen action while enforcing actions during sampling (so TeacherForcingDistr does not fully behave as a distribution). Maybe we want to call it by a different name? It's used as a distribution, but maybe we need a superclass for Distr called Sampleable/Sampler, which can specialize as a Distr (with all its expected properties) or a Teacher, which kind of hacks some details since it's no longer a Distr?

allenact/base_abstractions/distributions.py Outdated Show resolved Hide resolved
allenact/base_abstractions/distributions.py Show resolved Hide resolved
allenact/base_abstractions/distributions.py Outdated Show resolved Hide resolved
allenact/base_abstractions/distributions.py Outdated Show resolved Hide resolved
allenact/base_abstractions/distributions.py Outdated Show resolved Hide resolved
allenact/base_abstractions/distributions.py Outdated Show resolved Hide resolved
@lgtm-com
Copy link

lgtm-com bot commented Apr 29, 2021

This pull request introduces 3 alerts when merging 2d5fd70 into 1882ff1 - view on LGTM.com

new alerts:

  • 3 for Unused import

Copy link
Collaborator

@Lucaweihs Lucaweihs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking much more streamlined. This is definitely a big update so I've added a bunch of additional comments. To me it seems like the one missing thing is the ability to not pass all actions to the task. Generally I suspect that, when using hierarchical models, most of the actions are going to be "internal", i.e. they're needed to chose the final action but don't actually mean anything to the task. This might happen if, for example, the top-level agent's action space was [f"search_for_{x}" for x in object_types] and the lower-level agent's actions were the movement actions.

allenact/algorithms/onpolicy_sync/engine.py Outdated Show resolved Hide resolved
allenact/algorithms/onpolicy_sync/engine.py Outdated Show resolved Hide resolved
allenact/algorithms/onpolicy_sync/engine.py Outdated Show resolved Hide resolved
allenact/algorithms/onpolicy_sync/engine.py Outdated Show resolved Hide resolved
allenact/algorithms/onpolicy_sync/engine.py Outdated Show resolved Hide resolved
allenact/base_abstractions/sensor.py Outdated Show resolved Hide resolved
allenact/base_abstractions/sensor.py Outdated Show resolved Hide resolved
allenact/base_abstractions/sensor.py Outdated Show resolved Hide resolved
allenact/embodiedai/sensors/vision_sensors.py Show resolved Hide resolved
allenact/embodiedai/sensors/vision_sensors.py Outdated Show resolved Hide resolved
- Old Vision Sensors tombstones + capitalized constants in ExpertSensor
- `self.num_active_samplers` from vector tasks in engine
- `dist_wrapper_class` argument in engine's `act`
- Explicit call to `condition_on_input` in `ConditionalDistr`
- Bug fix in ConditionedMiniGridTask stub
@jordis-ai2 jordis-ai2 marked this pull request as ready for review May 12, 2021 01:03
@jordis-ai2
Copy link
Collaborator Author

I went ahead and added a new test (using the MiniGrid hacks I used while developing hierarchical policy support) + command line option to collect validation results + command line option to enforce expert during inference

Copy link
Collaborator

@Lucaweihs Lucaweihs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I left a few comments here and there, mostly asking for some additional documentation in places. As the hierarchical code is pretty involved it would be be great to have a tutorial that really went through all of the details, we could even reference this tutorial in the code so that people could learn things by example.

@@ -1416,6 +1374,7 @@ def __init__(
worker_id: int = 0,
num_workers: int = 1,
distributed_port: int = 0,
enforce_expert: bool = False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer that we didn't offer this option. I know it's off by default but I want to make it as hard as possible to accidentally run the expert in the inference phase (especially as the tensorboard log would associate these results with the tag from config file). Perhaps we should add an option to run the expert as in a "stand-alone" mode (with its own tag) where we don't even require the "ExpertConfig" to have defined a model.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. I still think this is worth it, though. For example, at some development point in Jiasen's project, we wanted to check what happens when we enable the expert for only one/some of the skills in a hierarchical policy. For that, we needed to run the expert and the model during inference. Let me know your thoughts.

allenact/algorithms/onpolicy_sync/engine.py Outdated Show resolved Hide resolved
allenact/algorithms/onpolicy_sync/losses/imitation.py Outdated Show resolved Hide resolved
allenact/algorithms/onpolicy_sync/runner.py Outdated Show resolved Hide resolved
allenact/base_abstractions/sensor.py Show resolved Hide resolved
allenact/base_abstractions/sensor.py Outdated Show resolved Hide resolved
allenact/base_abstractions/task.py Show resolved Hide resolved
allenact/main.py Outdated Show resolved Hide resolved
projects/tutorials/minigrid_tutorial_conds.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants