Frequently Asked Questions

Are the step-by-step instructions aligned with subgoals?

Yes, each step-by-step instruction has a corresponding subgoal in the training and validation trajectories. If you use this alignment during training, please see the submission guidelines for leaderboard submissions.

Getting 100% success rate with ground-truth trajectories

You should be able to achieve >99% success rate on training and validation tasks with the ground-truth actions and masks from the dataset. Occasionally, some non-determistic behaviors in THOR can lead to failures, but they are extremely rare.

Can you train an agent without mask prediction?

Mask prediction is an important part of the ALFRED challenge. Unlike non-interactive environments (e.g vision-language navigation) here it's necessary for the agent to specify what exactly it wants to interact with.

Why do `feat_conv.pt` in Full Dataset have 10 more frames than the number of images?

The last 10 frames are copies of the features from the last image frame.

Can I train with templated goal descriptions?

Yes. Run the training script with --use_templated_goals.

How do I get panoramic image observations?

You can use augment_trajectories.py to replay all the trajectories and augment the visual observations. At each step, use the THOR API to look around and take 6-12 shots of the surrounding. Then stitch together these shots to create a panoramic image for each frame. You might have to set 'forceAction': True for smooth moveahead/rotate/look. Note that getting panoramic images during test time would incur the additional cost of looking around with the agent.

Why do `feat_conv.pt` in Modeling Quickstart contain fewer frames than in Full Dataset

The Full Dataset contains extracted Resnet features for each frame in ['images'] which include filler frames inbetween each low-action (used to generate smooth videos), whereas Modeling Quickstart only contains features for each low_idx that correspond to frames after taking each low-level action.

Can I train the model on a smaller dataset for quick debugging?

Yes, run the training script with --fast_epoch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAQ.md

FAQ.md

Frequently Asked Questions

Are the step-by-step instructions aligned with subgoals?

Getting 100% success rate with ground-truth trajectories

Can you train an agent without mask prediction?

Why do `feat_conv.pt` in Full Dataset have 10 more frames than the number of images?

Can I train with templated goal descriptions?

How do I get panoramic image observations?

Why do `feat_conv.pt` in Modeling Quickstart contain fewer frames than in Full Dataset

Can I train the model on a smaller dataset for quick debugging?

Files

FAQ.md

Latest commit

History

FAQ.md

File metadata and controls

Frequently Asked Questions

Are the step-by-step instructions aligned with subgoals?

Getting 100% success rate with ground-truth trajectories

Can you train an agent without mask prediction?

Why do feat_conv.pt in Full Dataset have 10 more frames than the number of images?

Can I train with templated goal descriptions?

How do I get panoramic image observations?

Why do feat_conv.pt in Modeling Quickstart contain fewer frames than in Full Dataset

Can I train the model on a smaller dataset for quick debugging?

Why do `feat_conv.pt` in Full Dataset have 10 more frames than the number of images?

Why do `feat_conv.pt` in Modeling Quickstart contain fewer frames than in Full Dataset