Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cut_id does not match original utt_id #620

Closed
wgb14 opened this issue Oct 14, 2022 · 2 comments
Closed

cut_id does not match original utt_id #620

wgb14 opened this issue Oct 14, 2022 · 2 comments

Comments

@wgb14
Copy link
Contributor

wgb14 commented Oct 14, 2022

From #522 (comment), in decoding script, we tried to introduce utt_id into recognition results to better compare:

cut_ids = [cut.id for cut in batch["supervisions"]["cut"]]

But actually cut_id is not the original utt_id in dataset, especially after cut_set.trim_to_supervisions(), cut_id becomes random value. In my experiments it should be

utt_ids = [cut.supervisions[0].id for cut in batch["supervisions"]["cut"]]
@pzelasko
Copy link
Collaborator

Good point, maybe we should change trim to supervisions behavior in Lhotse to adopt the supervision ID instead. Could you make a PR?

@wgb14
Copy link
Contributor Author

wgb14 commented Oct 15, 2022

Good point, maybe we should change trim to supervisions behavior in Lhotse to adopt the supervision ID instead. Could you make a PR?

Agreed. I'll open a PR for this.

But this doesn't work for recipes like librispeech since it does not do cut_set.trim_to_supervisions()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants