UnboundLocalError when evaluating the Episodic Transformer baselines #9

yingShen-ys · 2022-01-29T14:47:22Z

Hello,

I have been trying to evaluate the Episodic Transformer baselines for the TEACh Benchmark Challenge. And I keep getting the following error message when I am running the evaluation script provided inside the ET directory. I have also tried running the evaluation via "teach_inference". The error is the same.

Traceback (most recent call last):
  File "/home/ubuntu/workplace/teach/src/teach/inference/inference_runner.py", line 121, in _run
    instance_id, instance_metrics = InferenceRunner._run_edh_instance(instance_file, config, model, er)
  File "/home/ubuntu/workplace/teach/src/teach/inference/inference_runner.py", line 221, in _run_edh_instance
    traj_steps_taken,
UnboundLocalError: local variable 'traj_steps_taken' referenced before assignment

I am doing the inference on an AWS instance. I have started the X-server and installed all requirements and prerequisites without bug.

Here's the script I used for evaluation.

#!/bin/sh

export AWS_ROOT=/home/ubuntu/workplace
export ET_DATA=$AWS_ROOT/data
export TEACH_ROOT_DIR=$AWS_ROOT/teach
export TEACH_SRC_DIR=$TEACH_ROOT_DIR/src
export ET_ROOT=$TEACH_SRC_DIR/teach/modeling/ET
export ET_LOGS=$TEACH_ROOT_DIR/src/teach/modeling/ET/checkpoints
export INFERENCE_OUTPUT_PATH=$TEACH_ROOT_DIR/inference_output
export PYTHONPATH=$TEACH_SRC_DIR:$ET_ROOT:$PYTHONPATH
export SPLIT=valid_seen

cd $TEACH_ROOT_DIR
python src/teach/cli/inference.py \
            --model_module teach.inference.et_model \
                --model_class ETModel \
                    --data_dir $ET_DATA \
                        --output_dir $INFERENCE_OUTPUT_PATH/inference__teach_et_trial_$SPLIT \
                            --split $SPLIT \
                                --metrics_file $INFERENCE_OUTPUT_PATH/metrics__teach_et_trial_$SPLIT.json \
                                    --seed 4 \
                                        --model_dir $ET_DATA/baseline_models/et \
                                            --object_predictor $ET_LOGS/pretrained/maskrcnn_model.pth \
                                            --visual_checkpoint $ET_LOGS/pretrained/fasterrcnn_model.pth \
                                                --device "cpu" \
                                                --images_dir $INFERENCE_OUTPUT_PATH/images

Could you help me with this? Thanks!

The text was updated successfully, but these errors were encountered:

aishwaryap · 2022-02-01T00:17:04Z

Hi @yingShen-ys,

It looks like you have caught an interesting corner case and I will push a fix to prevent this error shortly. However, it is very likely that you are running into this error because this line failed. Your terminal output should include the stack trace of the actual error that you are running into (although the try-catch we have there will ensure that the process does not get killed due to the error). Can you provide a larger section of the terminal output of the above run (at least 20-30 lines before the above error)? If you are able to share a larger amount of the output, I recommend pasting it in a .txt file, uploading it somewhere and providing a link in your response.

Best,
Aishwarya

P.S.: I am on leave for most of this week so responses to issues will be slower than usual this week.

PeixinC · 2022-02-01T06:04:57Z

I have the same error. I am trying to evaluate the ET model locally. The terminal output is shown below.

[MainThread-17308-DEBUG] teach.utils: Creating task from state diff ...
DEBUG:teach.utils:Creating task from state diff ...
[MainThread-17308-DEBUG] teach.inference.inference_runner: Processing instance 04b613ff7dfa1bea_8b8d.edh1
DEBUG:teach.inference.inference_runner:Processing instance 04b613ff7dfa1bea_8b8d.edh1
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_base: Resetting dataset object and removing previously stored episodes...
[MainThread-17308-INFO] teach.inference.inference_runner: Started episode replay with timeout: 500 sec
INFO:teach.simulators.simulator_base:Resetting dataset object and removing previously stored episodes...
INFO:teach.inference.inference_runner:Started episode replay with timeout: 500 sec
[ThreadPoolExecutor-1_0-17308-INFO] teach.replay.episode_replay: Starting episode...
INFO:teach.replay.episode_replay:Starting episode...
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: In simulator_THOR.start_new_episode, world = FloorPlan17_physics world_type = None
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: In simulator_THOR.start_new_episode, world = FloorPlan17_physics world_type = None
INFO:teach.simulators.simulator_THOR:In simulator_THOR.start_new_episode, world = FloorPlan17_physics world_type = None
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: In SimulatorTHOR.start_new_episode, before __launch_simulator
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: In SimulatorTHOR.start_new_episode, before __launch_simulator
INFO:teach.simulators.simulator_THOR:In SimulatorTHOR.start_new_episode, before __launch_simulator
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: In SimulatorTHOR.__launch_simulator, creating ai2thor controller (unity process)
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: In SimulatorTHOR.__launch_simulator, creating ai2thor controller (unity process)
INFO:teach.simulators.simulator_THOR:In SimulatorTHOR.__launch_simulator, creating ai2thor controller (unity process)
DEBUG:ai2thor.build:/home/peixin/.ai2thor/releases/thor-Linux64-fdc047690ee0ab7a91ede50d286bd387d379713a/thor-Linux64-fdc047690ee0ab7a91ede50d286bd387d379713a exists - skipping download
INFO:root:Initialize return: {'cameraNearPlane': 0.10000000149011612, 'cameraFarPlane': 20.0}
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: Time to create controller: 3.959545850753784 sec
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: Time to create controller: 3.959545850753784 sec
INFO:teach.simulators.simulator_THOR:Time to create controller: 3.959545850753784 sec
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: Time to launch simulator: 5.067909240722656 sec
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: Time to launch simulator: 5.067909240722656 sec
INFO:teach.simulators.simulator_THOR:Time to launch simulator: 5.067909240722656 sec
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: Launched world: FloorPlan17_physics; commander embodied: False
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: Launched world: FloorPlan17_physics; commander embodied: False
DEBUG:teach.simulators.simulator_THOR:Launched world: FloorPlan17_physics; commander embodied: False
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: In SimulatorTHOR.start_new_episode, completed __launch_simulator
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: In SimulatorTHOR.start_new_episode, completed __launch_simulator
INFO:teach.simulators.simulator_THOR:In SimulatorTHOR.start_new_episode, completed __launch_simulator
[ThreadPoolExecutor-1_0-17308-INFO] teach.replay.episode_replay: ... done
INFO:teach.replay.episode_replay:... done
[ThreadPoolExecutor-1_0-17308-INFO] teach.replay.episode_replay: Loading initial scene state...
INFO:teach.replay.episode_replay:Loading initial scene state...
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: Loaded from supplied init state arg
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: Loaded from supplied init state arg
DEBUG:teach.simulators.simulator_THOR:Loaded from supplied init state arg
[ThreadPoolExecutor-1_0-17308-INFO] teach.replay.episode_replay: ... done
INFO:teach.replay.episode_replay:... done
[ThreadPoolExecutor-1_0-17308-INFO] teach.replay.episode_replay: Setting to custom task <teach.dataset.task_THOR.Task_THOR object at 0x7ff5dc62d4f0>
INFO:teach.replay.episode_replay:Setting to custom task <teach.dataset.task_THOR.Task_THOR object at 0x7ff5dc62d4f0>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: Setting task = <teach.dataset.task_THOR.Task_THOR object at 0x7ff5dc62d4f0>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: Setting task = <teach.dataset.task_THOR.Task_THOR object at 0x7ff5dc62d4f0>
DEBUG:teach.simulators.simulator_THOR:Setting task = <teach.dataset.task_THOR.Task_THOR object at 0x7ff5dc62d4f0>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: New task: 0, edh_custom, , []
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: New task: 0, edh_custom, , []
DEBUG:teach.simulators.simulator_THOR:New task: 0, edh_custom, , []
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: SimulatorTHOR set_task done New task: 0, edh_custom,
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: SimulatorTHOR set_task done New task: 0, edh_custom,
INFO:teach.simulators.simulator_THOR:SimulatorTHOR set_task done New task: 0, edh_custom,
[ThreadPoolExecutor-1_0-17308-INFO] teach.replay.episode_replay: ... done
INFO:teach.replay.episode_replay:... done
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Keyboard, Text>>
DEBUG:teach.replay.episode_replay:taking action <<Keyboard, Text>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: *** Driver - Keyboard: How can I help? ***
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: *** Driver - Keyboard: How can I help? ***
DEBUG:teach.simulators.simulator_THOR:*** Driver - Keyboard: How can I help? ***
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<ProgressCheck, OpenProgressCheck>>
DEBUG:teach.replay.episode_replay:taking action <<ProgressCheck, OpenProgressCheck>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Motion, Turn Right>>
DEBUG:teach.replay.episode_replay:taking action <<Motion, Turn Right>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<ProgressCheck, SelectOid>>
DEBUG:teach.replay.episode_replay:taking action <<ProgressCheck, SelectOid>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Keyboard, Text>>
DEBUG:teach.replay.episode_replay:taking action <<Keyboard, Text>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: *** Commander - Keyboard: boil some potato please ***
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: *** Commander - Keyboard: boil some potato please ***
DEBUG:teach.simulators.simulator_THOR:*** Commander - Keyboard: boil some potato please ***
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<ProgressCheck, SelectOid>>
DEBUG:teach.replay.episode_replay:taking action <<ProgressCheck, SelectOid>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Motion, Pan Left>>
DEBUG:teach.replay.episode_replay:taking action <<Motion, Pan Left>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Motion, Pan Left>>
DEBUG:teach.replay.episode_replay:taking action <<Motion, Pan Left>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Motion, Turn Right>>
DEBUG:teach.replay.episode_replay:taking action <<Motion, Turn Right>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Motion, Pan Left>>
DEBUG:teach.replay.episode_replay:taking action <<Motion, Pan Left>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Motion, Pan Left>>
DEBUG:teach.replay.episode_replay:taking action <<Motion, Pan Left>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Motion, Forward>>
DEBUG:teach.replay.episode_replay:taking action <<Motion, Forward>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Motion, Forward>>
DEBUG:teach.replay.episode_replay:taking action <<Motion, Forward>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<ProgressCheck, SelectOid>>
DEBUG:teach.replay.episode_replay:taking action <<ProgressCheck, SelectOid>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<ObjectInteraction, Pickup>>
DEBUG:teach.replay.episode_replay:taking action <<ObjectInteraction, Pickup>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Keyboard, Text>>
DEBUG:teach.replay.episode_replay:taking action <<Keyboard, Text>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: *** Commander - Keyboard: potato is on the white shelf ***
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: *** Commander - Keyboard: potato is on the white shelf ***
DEBUG:teach.simulators.simulator_THOR:*** Commander - Keyboard: potato is on the white shelf ***
[MainThread-17308-INFO] teach.inference.inference_runner: Elapsed time for episode replay: 7.0206011610571295
INFO:teach.inference.inference_runner:Elapsed time for episode replay: 7.0206011610571295
[MainThread-17308-ERROR] teach.inference.inference_runner: exception happened for instance=/media/peixin/DATA/Research/teach/subset/edh_instances/valid_seen/04b613ff7dfa1bea_8b8d.edh1.json, continue with the rest
Traceback (most recent call last):
File "/media/peixin/DATA/Research/teach/src/teach/inference/inference_runner.py", line 121, in _run
instance_id, instance_metrics = InferenceRunner._run_edh_instance(instance_file, config, model, er)
File "/media/peixin/DATA/Research/teach/src/teach/inference/inference_runner.py", line 219, in _run_edh_instance
traj_steps_taken,
UnboundLocalError: local variable 'traj_steps_taken' referenced before assignment
ERROR:teach.inference.inference_runner:exception happened for instance=/media/peixin/DATA/Research/teach/subset/edh_instances/valid_seen/04b613ff7dfa1bea_8b8d.edh1.json, continue with the rest
Traceback (most recent call last):
File "/media/peixin/DATA/Research/teach/src/teach/inference/inference_runner.py", line 121, in _run
instance_id, instance_metrics = InferenceRunner._run_edh_instance(instance_file, config, model, er)
File "/media/peixin/DATA/Research/teach/src/teach/inference/inference_runner.py", line 219, in _run_edh_instance
traj_steps_taken,
UnboundLocalError: local variable 'traj_steps_taken' referenced before assignment

yingShen-ys · 2022-02-01T14:09:13Z

Hi @aishwaryap,

Here is the error.log. It is similar to what @PeixinC has pasted.

I think this error happens for every instance (I didn't run the entire inference) rather than just some of them.

yingShen-ys · 2022-02-01T19:48:36Z

@aishwaryap

Alright, I think I might have found the problem. model_started_success becomes None after this line.

And I think the problem is that the function start_new_edh_instance()(https://github.com/alexa/teach/blob/main/src/teach/inference/et_model.py#L90) does not return anything after execution.

So the workaround I did is basically change model_started_success = True in this line. So now we have model_started_success = False only when there's an exception.

model_started_success = True
try:
    model.start_new_edh_instance(edh_instance, edh_history_images, instance_file)
except Exception:
    model_started_success = False
    metrics["error"] = 1
    logger.error(f"Failed to start_new_edh_instance for {instance_id}", exc_info=True)

So far, I am able to run the inference bug-free after this modification.

Please let me know if I missed anything. Thanks!

aishwaryap · 2022-02-02T18:21:40Z

Hi @yingShen-ys

You did identify the problem correctly but I feel a preferred solution would be to correct ETModel.start_new_edh_instance to return True on successful initialization. I have pushed this change.

Thanks a lot for catching this!

Aishwarya

yingShen-ys · 2022-02-03T14:18:09Z

Hi @yingShen-ys

You did identify the problem correctly but I feel a preferred solution would be to correct ETModel.start_new_edh_instance to return True on successful initialization. I have pushed this change.

Thanks a lot for catching this!

Aishwarya

Great, thank you.

aishwaryap closed this as completed Feb 2, 2022

yingShen-ys mentioned this issue Feb 3, 2022

Much higher scores when evaluating Episodic Transformer baselines for EDH instances #10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnboundLocalError when evaluating the Episodic Transformer baselines #9

UnboundLocalError when evaluating the Episodic Transformer baselines #9

yingShen-ys commented Jan 29, 2022

aishwaryap commented Feb 1, 2022

PeixinC commented Feb 1, 2022

yingShen-ys commented Feb 1, 2022 •

edited

Loading

yingShen-ys commented Feb 1, 2022

aishwaryap commented Feb 2, 2022

yingShen-ys commented Feb 3, 2022

UnboundLocalError when evaluating the Episodic Transformer baselines #9

UnboundLocalError when evaluating the Episodic Transformer baselines #9

Comments

yingShen-ys commented Jan 29, 2022

aishwaryap commented Feb 1, 2022

PeixinC commented Feb 1, 2022

yingShen-ys commented Feb 1, 2022 • edited Loading

yingShen-ys commented Feb 1, 2022

aishwaryap commented Feb 2, 2022

yingShen-ys commented Feb 3, 2022

yingShen-ys commented Feb 1, 2022 •

edited

Loading