-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Official" train/dev/test? #1
Comments
Hi @hbredin However, because Hitachi folks have started (and other followed) using part1 for fine-tuning and reporting results on part2 on EEND models, we did the same in our latest end-to-end work and continue with this setup. However, this is mainly because the community seems to have adopted this setup and we wanted to be able to compare against existing results. |
Thanks. That's very helpful. So all papers by Hitachi use part1 for fine-tuning and part2 for testing? What about updating the README with your answer? This would definitely help the community (in the same way |
Yes, we used this setup. |
Thanks for sharing. FYI, in our previous work we did 5-fold evaluation.
|
Yes, we used the same setup recently (cc @popcornell) where part1 was used for adaptation. |
Thanks everyone for the comments. |
Thanks everyone for your feedback! |
There's one more thing that needs to be checked before our results really are comparable: the reference labels. Would it be possible to share them here as well? |
The ones I used are shared here: https://github.com/google/speaker-id/tree/master/publications/LstmDiarization/evaluation/NIST_SRE2000 Disk 8 is CALLHOME, and Disk 6 is SwitchBoard. |
Thanks @wq2012. That is what I started using as well. |
Hi Herve, Callhome is LDC propietary data that can only be obtained after purchase and we believe that we might violate some copyright issues if we publish the reference files from it. We will consult with LDC if we can directly share our rttm files here, it would be good to have it all together in the repository, but we prefer to be on the safer side and get an approval first. |
Hmm, are you sure? Is that the same version as the LDC callhome? IIRC we simply searched Google and downloaded them from other publicly available domains and thought these had already been publicly circulated. |
Totally makes sense. Thanks! |
@wq2012, there are several CALLHOME LDC datasets. That is why CALLHOME can refer so many sets in publications. We are waiting for a response from LDC, we will write an update after we hear from them. |
Thanks! But I don't think the references are included in any of the LDC Catalogs. |
For future reference, the RTTMs are also here: http://www.openslr.org/resources/10/sre2000-key.tar.gz |
Hi, Herve Is it right? |
I guess this is for Hitashi people to answer here. Here is what I do, on my side:
I don't think the actual split of part 1 (into train and dev) is really critical. |
I guess a good scenario is what Hervé commented where he has a split of Part 1. However, it can have the issue that the same speaker appears in the 75% used as train AND in the 25% used as validation and that can lead to over-optimistic results in the validation set. But that certainly is correct in that the test set (Part 2) is never used for developing the model. If I can add, I am afraid that many people are making decisions on Part 2 (which is the test set) and that should not be the case. Very few works report results on Part 2 without fine-tuning or comparisons on Part 1 (without fine-tuning). I would not mind |
@hbredin @fnlandini Thank you for your quick and considerable responses!! |
I have never reported results on CALLHOME because of the (apparent) lack of an official train/validation/test split (or at least validation/test split).
What experimental protocol does BUT use for reporting results?
Validation on part1, test on part2?
Validation on part2, test on part1?
Both?
cc @fnlandini
The text was updated successfully, but these errors were encountered: