Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-value slot values #2

Open
rizar opened this issue Aug 8, 2022 · 3 comments
Open

multi-value slot values #2

rizar opened this issue Aug 8, 2022 · 3 comments

Comments

@rizar
Copy link

rizar commented Aug 8, 2022

Thanks for releasing code!

I'm trying to understand how this implementation handles multi-value slot values introduced in MultiWOZ 2.1 (such as e.g. cheap|moderate). It appears that this implementation considers a slot to be correctly predicted if at least one of gold slot values is predicted (looking at this code [here])(https://github.com/Yushi-Hu/IC-DST/blob/main/evaluate_metrics.py#L58). I did not see similar handling of multi-values in code releases for prior work (e.g. https://github.com/jshin49/ds2 or https://github.com/chiahsuan156/DST-as-Prompting or https://github.com/facebookresearch/Zero-Shot-DST/tree/main/TransferQA). Can you please comment on that?

@Yushi-Hu
Copy link
Owner

Yushi-Hu commented Aug 8, 2022

Hi Dzmitry,

Thanks for your interest in our work!
Yes, you are correct that in this implementation, a slot is considered to be correctly predicted if at least one of the gold slot values is predicted. I followed the evaluation pipeline from a really popular prior work TODBERT (The evaluation implementation is in the "evaluate" function here). The prior work on MultiWOZ 2.4 also follows this ASSIST-DST

For MultiWOZ 2.1 and 2.2 this does not make much difference because these multi-value slots are not annotated well, and for most of the cases, only one value is there. I think that's the reason that most prior works just ignore this issue. For MultiWOZ 2.4 this makes a bigger difference because the annotators find that many slots actually have multiple values. Now in DST tasks, people are assuming that each slot only corresponds to one value. I totally agree that we should rethink carefully on this assumption.

@rizar
Copy link
Author

rizar commented Aug 9, 2022

Thanks Yushi for your fast response!

Yes, indeed SimpleTOD evaluation code compares multi-values in the same way as yours. Do you think some other implementations (including ASSIST-DST and the links I posted above) are effectively more strict and require the entire multi-value literal to be predicted correctly?

Thanks for the explanation about 2.1 and 2.4, I will take a look at the exact percentage of multi-values in different MultiWOZ versions.

As for what the right evaluation approach should be, that depends on the exact semantics of the "|" operator. My understanding is that if "|" is logical OR, then all values should be predicted correctly. But if somewhere in the dataset it is used to indicate alternative spellings, then the "one-of" evaluation approach would be more appropriate. As usual it all boils down to there being a consistent and well-documented annotation approach, something that MultiWOZ still seems to be lacking.

@Yushi-Hu
Copy link
Owner

Yushi-Hu commented Aug 9, 2022

Thanks rizar for your insights!

As for the first question, I checked some implementations, and in most cases, they didn't handle the multi-label scenario carefully. Some implementations just use the first possible value as the gold answer. I agree with the way ASSIST-DST handles the problem ---- normalize the labels by sorting the possible values. It effectively gives a more strict evaluation.

For your second comment, I totally agree! It boils down to the need for a well-documented annotation approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants