-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluating Episodic Transformer baselines for EDH instances gives zero successes #8
Comments
Hi, While we do get some variance in our performance I don't think I have ever got a success rate of 0. Could you upload the saved metrics file somewhere and share the URL? Thanks! |
Hi @aishwaryap, The metrics file can be found here. Please let me know if you need any other information from my side! Thanks, |
Hi @dv-fenix , I've noticed a few things from your command and metrics file. First of all, I have a feeling that 50 processes is too much for most machines to handle. In our testing, we have always set the number of processes equal to the number of GPUs. I also noticed by checking your metrics file that only 64 EDH instances have been evaluated (the metrics file is a Since we have Hope this helps, P.S.: I am on leave for most of this week so responses to issues may be slower than usual this week. |
Cool! I'll try it out and let you know how it goes. |
Hi @dv-fenix In addition to using fewer processes, I recommend pulling the latest version of the code. There was a small bug that would result in non-Docker inference erroring out and it has been fixed now. Best, |
Hi @aishwaryap I ran inference on a pertained ET baseline model using
There were no errors in the execution of the process. You can find the metrics file and the terminal output file in this folder. Your initial thought on their being too many processes may be correct. I am thinking about running an inference on the entire data with Thanks, |
Hi @aishwaryap I tried running the inference on 4GPUs using
I am using the updated version of the repository. This error is only thrown when Thanks, |
@dv-fenix |
Hi @hangjieshi A total of 68 instances were successfully processed. I checked the GPU memory usage multiple times during the process, none of the GPUs ran out of memory at any stage. Apart from the manual checks, I also analysed the terminal output file. Had the GPUs run out of memory, |
@dv-fenix |
Hi @dv-fenix Just wanted to elaborate on the above response. It's hard to be entirely sure but I think you have run into a thorny issue with AI2-THOR that we have struggled with throughout this project. The problem we have is related to issues 903, 745, and 711 on AI2-THOR (there are probably more), but the behavior we see is not exactly the same as what is listed in those issues. Essentially sometimes during data collection, episode replay, inference (anything where interaction with the AI2-THOR simulator is required), sometimes the process just hangs. Specifically we can trace it to the call to One step towards verifying that the issue is indeed from AI2-THOR is to increase In either case, it is not strictly necessary to finish evaluating all EDH instances in a single run. If you rerun the inference command keeping |
Hi @aishwaryap Thank you for your insights. I tried using |
Hi!
I have been trying to replicate the results of the Episodic Transformer (ET) baselines for the EDH benchmark. The inference script runs without any errors but the ET baselines provided along with this repository give zero successes on all the validation EDH splits (both ['valid-seen', 'valid-unseen']).
This behavior can be replicated using the instructions in the ET root directory (found here), specifically the following script:
I also tried training the basic ET baseline from scratch. Running the evaluation script on this model also leads to zero successes.
The text was updated successfully, but these errors were encountered: