You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was reading about the recent changes to the code reported in #10 and we unfortunately get results that differ substantially from yours. I started dissecting the code to understand what's the reason for such discrepancies in the results. From my understanding of the inference_runner.py script, you spawn several processes, each with a given portion of the tasks. However, I can see that the exception handling logic simply ignores an instance that raises an error: https://github.com/emma-simbot/teach/blob/speaker_tokens/src/teach/inference/inference_runner.py#L130
This is detrimental because if a dataset instance errors for whatever reason, its contribution to the overall metrics is ignored. Instead, the proper way of dealing with this should be to ignore that trajectory and still add to the metrics that you were not successful. Potentially, such faulty trajectories should be reported in the metrics file for future debugging.
Am I missing something?
The text was updated successfully, but these errors were encountered:
There is a separation here to differentiate between possible errors from the model and the rest of the inference code. Most of the error handling is inside InferenceRunner._run_edh_instance (here) where exceptions from invocations of the model should get logged in metrics. Are you having specific errors at other places that are not getting caught?
That said, we probably should be returning the list of failed files in InferenceRunner._run() for debugging. I will work on this on Monday.
For leaderboard evaluations, we are separately confirming that all inference files did get saved.
Hi @aishwaryap,
I was reading about the recent changes to the code reported in #10 and we unfortunately get results that differ substantially from yours. I started dissecting the code to understand what's the reason for such discrepancies in the results. From my understanding of the
inference_runner.py
script, you spawn several processes, each with a given portion of the tasks. However, I can see that the exception handling logic simply ignores an instance that raises an error: https://github.com/emma-simbot/teach/blob/speaker_tokens/src/teach/inference/inference_runner.py#L130This is detrimental because if a dataset instance errors for whatever reason, its contribution to the overall metrics is ignored. Instead, the proper way of dealing with this should be to ignore that trajectory and still add to the metrics that you were not successful. Potentially, such faulty trajectories should be reported in the metrics file for future debugging.
Am I missing something?
The text was updated successfully, but these errors were encountered: