`cut_id` does not match original `utt_id` #620

wgb14 · 2022-10-14T22:34:49Z

From #522 (comment), in decoding script, we tried to introduce utt_id into recognition results to better compare:

icefall/egs/librispeech/ASR/pruned_transducer_stateless5/decode.py

Line 567 in a66e74b

cut_ids = [cut.id for cut in batch["supervisions"]["cut"]]

But actually cut_id is not the original utt_id in dataset, especially after cut_set.trim_to_supervisions(), cut_id becomes random value. In my experiments it should be

utt_ids = [cut.supervisions[0].id for cut in batch["supervisions"]["cut"]]

The text was updated successfully, but these errors were encountered:

pzelasko · 2022-10-14T22:46:07Z

Good point, maybe we should change trim to supervisions behavior in Lhotse to adopt the supervision ID instead. Could you make a PR?

wgb14 · 2022-10-15T00:14:08Z

Good point, maybe we should change trim to supervisions behavior in Lhotse to adopt the supervision ID instead. Could you make a PR?

Agreed. I'll open a PR for this.

But this doesn't work for recipes like librispeech since it does not do cut_set.trim_to_supervisions()

wgb14 mentioned this issue Oct 15, 2022

Match cut_id to utt_id if there is exactly one supervision per cut lhotse-speech/lhotse#853

Merged

JinZr closed this as completed Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`cut_id` does not match original `utt_id` #620

`cut_id` does not match original `utt_id` #620

wgb14 commented Oct 14, 2022 •

edited

Loading

pzelasko commented Oct 14, 2022

wgb14 commented Oct 15, 2022

cut_id does not match original utt_id #620

cut_id does not match original utt_id #620

Comments

wgb14 commented Oct 14, 2022 • edited Loading

pzelasko commented Oct 14, 2022

wgb14 commented Oct 15, 2022

`cut_id` does not match original `utt_id` #620

`cut_id` does not match original `utt_id` #620

wgb14 commented Oct 14, 2022 •

edited

Loading