-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Much higher scores when evaluating Episodic Transformer baselines for EDH instances #10
Comments
Hi @yingShen-ys That sounds like a reasonable result. I will leave the issue open however, so that we can see if others are able to reproduce it. Best, |
Got it. Thank you for the clarification. |
We got a similar result, and actually the results can be significantly different when training on different machines. |
I believe differences are less likely to be due to the machine used for training but rather the effect of the random seed. We have also seen this behavior with training the ET model on ALFRED. |
Hello,
I have finished the evaluation of the Episodic Transformer baselines for the TEACh Benchmark Challenge on the valid_seen.
However, one weird thing I found is that our reproduced result is much higher than what is reported in the paper. The result is shown below (All values are percentages). There is a total of 608 EDH instances (valid_seen) in the metric file which matches the number in the paper.
I believe I am using the correct checkpoints. And the only change I made to the code is mentioned in #9.
I am running on an AWS instance. I have started the X-server and installed all requirements and prerequisites without bugs. And the inference process is bugfree.
Here is the script I used for evaluation.
I wonder if the data split provided in the dataset is the same as the paper. And if so, what would be the possible explanation for this?
Please let me know if someone else is getting similar results. Thank you!
The text was updated successfully, but these errors were encountered: