Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnboundLocalError when evaluating the Episodic Transformer baselines #9

Closed
yingShen-ys opened this issue Jan 29, 2022 · 6 comments
Closed

Comments

@yingShen-ys
Copy link

Hello,

I have been trying to evaluate the Episodic Transformer baselines for the TEACh Benchmark Challenge. And I keep getting the following error message when I am running the evaluation script provided inside the ET directory. I have also tried running the evaluation via "teach_inference". The error is the same.

Traceback (most recent call last):
  File "/home/ubuntu/workplace/teach/src/teach/inference/inference_runner.py", line 121, in _run
    instance_id, instance_metrics = InferenceRunner._run_edh_instance(instance_file, config, model, er)
  File "/home/ubuntu/workplace/teach/src/teach/inference/inference_runner.py", line 221, in _run_edh_instance
    traj_steps_taken,
UnboundLocalError: local variable 'traj_steps_taken' referenced before assignment

I am doing the inference on an AWS instance. I have started the X-server and installed all requirements and prerequisites without bug.

Here's the script I used for evaluation.

#!/bin/sh

export AWS_ROOT=/home/ubuntu/workplace
export ET_DATA=$AWS_ROOT/data
export TEACH_ROOT_DIR=$AWS_ROOT/teach
export TEACH_SRC_DIR=$TEACH_ROOT_DIR/src
export ET_ROOT=$TEACH_SRC_DIR/teach/modeling/ET
export ET_LOGS=$TEACH_ROOT_DIR/src/teach/modeling/ET/checkpoints
export INFERENCE_OUTPUT_PATH=$TEACH_ROOT_DIR/inference_output
export PYTHONPATH=$TEACH_SRC_DIR:$ET_ROOT:$PYTHONPATH
export SPLIT=valid_seen

cd $TEACH_ROOT_DIR
python src/teach/cli/inference.py \
            --model_module teach.inference.et_model \
                --model_class ETModel \
                    --data_dir $ET_DATA \
                        --output_dir $INFERENCE_OUTPUT_PATH/inference__teach_et_trial_$SPLIT \
                            --split $SPLIT \
                                --metrics_file $INFERENCE_OUTPUT_PATH/metrics__teach_et_trial_$SPLIT.json \
                                    --seed 4 \
                                        --model_dir $ET_DATA/baseline_models/et \
                                            --object_predictor $ET_LOGS/pretrained/maskrcnn_model.pth \
                                            --visual_checkpoint $ET_LOGS/pretrained/fasterrcnn_model.pth \
                                                --device "cpu" \
                                                --images_dir $INFERENCE_OUTPUT_PATH/images

Could you help me with this? Thanks!

@aishwaryap
Copy link
Contributor

Hi @yingShen-ys,

It looks like you have caught an interesting corner case and I will push a fix to prevent this error shortly. However, it is very likely that you are running into this error because this line failed. Your terminal output should include the stack trace of the actual error that you are running into (although the try-catch we have there will ensure that the process does not get killed due to the error). Can you provide a larger section of the terminal output of the above run (at least 20-30 lines before the above error)? If you are able to share a larger amount of the output, I recommend pasting it in a .txt file, uploading it somewhere and providing a link in your response.

Best,
Aishwarya

P.S.: I am on leave for most of this week so responses to issues will be slower than usual this week.

@PeixinC
Copy link

PeixinC commented Feb 1, 2022

I have the same error. I am trying to evaluate the ET model locally. The terminal output is shown below.

[MainThread-17308-DEBUG] teach.utils: Creating task from state diff ...
DEBUG:teach.utils:Creating task from state diff ...
[MainThread-17308-DEBUG] teach.inference.inference_runner: Processing instance 04b613ff7dfa1bea_8b8d.edh1
DEBUG:teach.inference.inference_runner:Processing instance 04b613ff7dfa1bea_8b8d.edh1
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_base: Resetting dataset object and removing previously stored episodes...
[MainThread-17308-INFO] teach.inference.inference_runner: Started episode replay with timeout: 500 sec
INFO:teach.simulators.simulator_base:Resetting dataset object and removing previously stored episodes...
INFO:teach.inference.inference_runner:Started episode replay with timeout: 500 sec
[ThreadPoolExecutor-1_0-17308-INFO] teach.replay.episode_replay: Starting episode...
INFO:teach.replay.episode_replay:Starting episode...
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: In simulator_THOR.start_new_episode, world = FloorPlan17_physics world_type = None
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: In simulator_THOR.start_new_episode, world = FloorPlan17_physics world_type = None
INFO:teach.simulators.simulator_THOR:In simulator_THOR.start_new_episode, world = FloorPlan17_physics world_type = None
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: In SimulatorTHOR.start_new_episode, before __launch_simulator
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: In SimulatorTHOR.start_new_episode, before __launch_simulator
INFO:teach.simulators.simulator_THOR:In SimulatorTHOR.start_new_episode, before __launch_simulator
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: In SimulatorTHOR.__launch_simulator, creating ai2thor controller (unity process)
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: In SimulatorTHOR.__launch_simulator, creating ai2thor controller (unity process)
INFO:teach.simulators.simulator_THOR:In SimulatorTHOR.__launch_simulator, creating ai2thor controller (unity process)
DEBUG:ai2thor.build:/home/peixin/.ai2thor/releases/thor-Linux64-fdc047690ee0ab7a91ede50d286bd387d379713a/thor-Linux64-fdc047690ee0ab7a91ede50d286bd387d379713a exists - skipping download
INFO:root:Initialize return: {'cameraNearPlane': 0.10000000149011612, 'cameraFarPlane': 20.0}
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: Time to create controller: 3.959545850753784 sec
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: Time to create controller: 3.959545850753784 sec
INFO:teach.simulators.simulator_THOR:Time to create controller: 3.959545850753784 sec
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: Time to launch simulator: 5.067909240722656 sec
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: Time to launch simulator: 5.067909240722656 sec
INFO:teach.simulators.simulator_THOR:Time to launch simulator: 5.067909240722656 sec
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: Launched world: FloorPlan17_physics; commander embodied: False
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: Launched world: FloorPlan17_physics; commander embodied: False
DEBUG:teach.simulators.simulator_THOR:Launched world: FloorPlan17_physics; commander embodied: False
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: In SimulatorTHOR.start_new_episode, completed __launch_simulator
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: In SimulatorTHOR.start_new_episode, completed __launch_simulator
INFO:teach.simulators.simulator_THOR:In SimulatorTHOR.start_new_episode, completed __launch_simulator
[ThreadPoolExecutor-1_0-17308-INFO] teach.replay.episode_replay: ... done
INFO:teach.replay.episode_replay:... done
[ThreadPoolExecutor-1_0-17308-INFO] teach.replay.episode_replay: Loading initial scene state...
INFO:teach.replay.episode_replay:Loading initial scene state...
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: Loaded from supplied init state arg
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: Loaded from supplied init state arg
DEBUG:teach.simulators.simulator_THOR:Loaded from supplied init state arg
[ThreadPoolExecutor-1_0-17308-INFO] teach.replay.episode_replay: ... done
INFO:teach.replay.episode_replay:... done
[ThreadPoolExecutor-1_0-17308-INFO] teach.replay.episode_replay: Setting to custom task <teach.dataset.task_THOR.Task_THOR object at 0x7ff5dc62d4f0>
INFO:teach.replay.episode_replay:Setting to custom task <teach.dataset.task_THOR.Task_THOR object at 0x7ff5dc62d4f0>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: Setting task = <teach.dataset.task_THOR.Task_THOR object at 0x7ff5dc62d4f0>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: Setting task = <teach.dataset.task_THOR.Task_THOR object at 0x7ff5dc62d4f0>
DEBUG:teach.simulators.simulator_THOR:Setting task = <teach.dataset.task_THOR.Task_THOR object at 0x7ff5dc62d4f0>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: New task: 0, edh_custom, , []
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: New task: 0, edh_custom, , []
DEBUG:teach.simulators.simulator_THOR:New task: 0, edh_custom, , []
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: SimulatorTHOR set_task done New task: 0, edh_custom,
[ThreadPoolExecutor-1_0-17308-INFO] teach.simulators.simulator_THOR: SimulatorTHOR set_task done New task: 0, edh_custom,
INFO:teach.simulators.simulator_THOR:SimulatorTHOR set_task done New task: 0, edh_custom,
[ThreadPoolExecutor-1_0-17308-INFO] teach.replay.episode_replay: ... done
INFO:teach.replay.episode_replay:... done
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Keyboard, Text>>
DEBUG:teach.replay.episode_replay:taking action <<Keyboard, Text>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: *** Driver - Keyboard: How can I help? ***
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: *** Driver - Keyboard: How can I help? ***
DEBUG:teach.simulators.simulator_THOR:*** Driver - Keyboard: How can I help? ***
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<ProgressCheck, OpenProgressCheck>>
DEBUG:teach.replay.episode_replay:taking action <<ProgressCheck, OpenProgressCheck>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Motion, Turn Right>>
DEBUG:teach.replay.episode_replay:taking action <<Motion, Turn Right>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<ProgressCheck, SelectOid>>
DEBUG:teach.replay.episode_replay:taking action <<ProgressCheck, SelectOid>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Keyboard, Text>>
DEBUG:teach.replay.episode_replay:taking action <<Keyboard, Text>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: *** Commander - Keyboard: boil some potato please ***
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: *** Commander - Keyboard: boil some potato please ***
DEBUG:teach.simulators.simulator_THOR:*** Commander - Keyboard: boil some potato please ***
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<ProgressCheck, SelectOid>>
DEBUG:teach.replay.episode_replay:taking action <<ProgressCheck, SelectOid>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Motion, Pan Left>>
DEBUG:teach.replay.episode_replay:taking action <<Motion, Pan Left>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Motion, Pan Left>>
DEBUG:teach.replay.episode_replay:taking action <<Motion, Pan Left>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Motion, Turn Right>>
DEBUG:teach.replay.episode_replay:taking action <<Motion, Turn Right>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Motion, Pan Left>>
DEBUG:teach.replay.episode_replay:taking action <<Motion, Pan Left>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Motion, Pan Left>>
DEBUG:teach.replay.episode_replay:taking action <<Motion, Pan Left>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Motion, Forward>>
DEBUG:teach.replay.episode_replay:taking action <<Motion, Forward>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Motion, Forward>>
DEBUG:teach.replay.episode_replay:taking action <<Motion, Forward>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<ProgressCheck, SelectOid>>
DEBUG:teach.replay.episode_replay:taking action <<ProgressCheck, SelectOid>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<ObjectInteraction, Pickup>>
DEBUG:teach.replay.episode_replay:taking action <<ObjectInteraction, Pickup>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.replay.episode_replay: taking action <<Keyboard, Text>>
DEBUG:teach.replay.episode_replay:taking action <<Keyboard, Text>>
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: *** Commander - Keyboard: potato is on the white shelf ***
[ThreadPoolExecutor-1_0-17308-DEBUG] teach.simulators.simulator_THOR: *** Commander - Keyboard: potato is on the white shelf ***
DEBUG:teach.simulators.simulator_THOR:*** Commander - Keyboard: potato is on the white shelf ***
[MainThread-17308-INFO] teach.inference.inference_runner: Elapsed time for episode replay: 7.0206011610571295
INFO:teach.inference.inference_runner:Elapsed time for episode replay: 7.0206011610571295
[MainThread-17308-ERROR] teach.inference.inference_runner: exception happened for instance=/media/peixin/DATA/Research/teach/subset/edh_instances/valid_seen/04b613ff7dfa1bea_8b8d.edh1.json, continue with the rest
Traceback (most recent call last):
File "/media/peixin/DATA/Research/teach/src/teach/inference/inference_runner.py", line 121, in _run
instance_id, instance_metrics = InferenceRunner._run_edh_instance(instance_file, config, model, er)
File "/media/peixin/DATA/Research/teach/src/teach/inference/inference_runner.py", line 219, in _run_edh_instance
traj_steps_taken,
UnboundLocalError: local variable 'traj_steps_taken' referenced before assignment
ERROR:teach.inference.inference_runner:exception happened for instance=/media/peixin/DATA/Research/teach/subset/edh_instances/valid_seen/04b613ff7dfa1bea_8b8d.edh1.json, continue with the rest
Traceback (most recent call last):
File "/media/peixin/DATA/Research/teach/src/teach/inference/inference_runner.py", line 121, in _run
instance_id, instance_metrics = InferenceRunner._run_edh_instance(instance_file, config, model, er)
File "/media/peixin/DATA/Research/teach/src/teach/inference/inference_runner.py", line 219, in _run_edh_instance
traj_steps_taken,
UnboundLocalError: local variable 'traj_steps_taken' referenced before assignment

@yingShen-ys
Copy link
Author

yingShen-ys commented Feb 1, 2022

Hi @aishwaryap,

Here is the error.log. It is similar to what @PeixinC has pasted.

I think this error happens for every instance (I didn't run the entire inference) rather than just some of them.

@yingShen-ys
Copy link
Author

@aishwaryap

Alright, I think I might have found the problem. model_started_success becomes None after this line.

And I think the problem is that the function start_new_edh_instance()(https://github.com/alexa/teach/blob/main/src/teach/inference/et_model.py#L90) does not return anything after execution.

So the workaround I did is basically change model_started_success = True in this line. So now we have model_started_success = False only when there's an exception.

model_started_success = True
try:
    model.start_new_edh_instance(edh_instance, edh_history_images, instance_file)
except Exception:
    model_started_success = False
    metrics["error"] = 1
    logger.error(f"Failed to start_new_edh_instance for {instance_id}", exc_info=True)

So far, I am able to run the inference bug-free after this modification.

Please let me know if I missed anything. Thanks!

@aishwaryap
Copy link
Contributor

Hi @yingShen-ys

You did identify the problem correctly but I feel a preferred solution would be to correct ETModel.start_new_edh_instance to return True on successful initialization. I have pushed this change.

Thanks a lot for catching this!

Aishwarya

@yingShen-ys
Copy link
Author

Hi @yingShen-ys

You did identify the problem correctly but I feel a preferred solution would be to correct ETModel.start_new_edh_instance to return True on successful initialization. I have pushed this change.

Thanks a lot for catching this!

Aishwarya

Great, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants