-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi-value slot values #2
Comments
Hi Dzmitry, Thanks for your interest in our work! For MultiWOZ 2.1 and 2.2 this does not make much difference because these multi-value slots are not annotated well, and for most of the cases, only one value is there. I think that's the reason that most prior works just ignore this issue. For MultiWOZ 2.4 this makes a bigger difference because the annotators find that many slots actually have multiple values. Now in DST tasks, people are assuming that each slot only corresponds to one value. I totally agree that we should rethink carefully on this assumption. |
Thanks Yushi for your fast response! Yes, indeed SimpleTOD evaluation code compares multi-values in the same way as yours. Do you think some other implementations (including ASSIST-DST and the links I posted above) are effectively more strict and require the entire multi-value literal to be predicted correctly? Thanks for the explanation about 2.1 and 2.4, I will take a look at the exact percentage of multi-values in different MultiWOZ versions. As for what the right evaluation approach should be, that depends on the exact semantics of the "|" operator. My understanding is that if "|" is logical OR, then all values should be predicted correctly. But if somewhere in the dataset it is used to indicate alternative spellings, then the "one-of" evaluation approach would be more appropriate. As usual it all boils down to there being a consistent and well-documented annotation approach, something that MultiWOZ still seems to be lacking. |
Thanks rizar for your insights! As for the first question, I checked some implementations, and in most cases, they didn't handle the multi-label scenario carefully. Some implementations just use the first possible value as the gold answer. I agree with the way ASSIST-DST handles the problem ---- normalize the labels by sorting the possible values. It effectively gives a more strict evaluation. For your second comment, I totally agree! It boils down to the need for a well-documented annotation approach. |
Thanks for releasing code!
I'm trying to understand how this implementation handles multi-value slot values introduced in MultiWOZ 2.1 (such as e.g. cheap|moderate). It appears that this implementation considers a slot to be correctly predicted if at least one of gold slot values is predicted (looking at this code [here])(https://github.com/Yushi-Hu/IC-DST/blob/main/evaluate_metrics.py#L58). I did not see similar handling of multi-values in code releases for prior work (e.g. https://github.com/jshin49/ds2 or https://github.com/chiahsuan156/DST-as-Prompting or https://github.com/facebookresearch/Zero-Shot-DST/tree/main/TransferQA). Can you please comment on that?
The text was updated successfully, but these errors were encountered: