Add jobs to study alignments #558

Icemole · 2024-11-19T11:39:39Z

PlotViterbiAlignmentJob: job to plot the Viterbi alignment. The point of this job is to check for a specific issue in the final alignments.

DumpSegmentTextAlignmentJob: job to dump text/alignment pairs for a specific corpus and set of alignment files. The point of this job is to review in a fast way the text against the corresponding alignments.

The output is a compressed text file of the form:

<segment-id>
<segment-text>
<id-0> <timestamp-start-0> <timestamp-end-0> <alignment-0> <weight-0>
...

Feel free to propose any other format in the comments.

The flow needed the parameters 'id' and 'TASK'.

It has nothing to do with returnn-hdf

DanEnergetics · 2024-11-19T16:48:03Z

I prefer the second to last or last dumping format that you suggested. They would make it easiest to get a quick grasp on the duration of certain phonemes for me.

mm/alignment.py

General: abstracted function to get alignment data to a public function, reformatted code, added docstrings. PlotViterbiAlignmentJob: added title in plot with segment text.

…tags

Icemole · 2024-11-20T09:34:41Z

I prefer the second to last or last dumping format that you suggested. They would make it easiest to get a quick grasp on the duration of certain phonemes for me.

Maybe we can specify a default mapping (csv file?) and an optional custom mapping to be provided by the user.

Edit/update: I tried to do this but faced issues with detailed print schemas: if we want to print timestamps and so on, we should provide the alignments as a tuple (timestamp, allophone_id, hmm_state, weight), the string corresponding to simply the triphone + HMM state doesn't do it. Of course, this could be easily solved by adding the specific triphone to the tuple, but that tuple format wouldn't be consistent with the rest, and I'd like the user to know what they're doing.

Let me know if you still want this custom printing approach.

mm/alignment.py

General: job fixes. DumpText...: updated output format (now more verbose).

Co-authored-by: michelwi <[email protected]>

Icemole added 2 commits November 19, 2024 09:33

Fix DumpAlignmentJob flow

9a7506c

The flow needed the parameters 'id' and 'TASK'.

Add job to dump text/alignment pairs for all segments

e07caff

Icemole requested review from albertz, curufinwe, christophmluscher, sarahberanek, JackTemaki, moothiringote, michelwi, hannah220, Atticus1806 and kuacakuaca November 19, 2024 11:39

Icemole added 7 commits November 19, 2024 11:40

Revert changes in branch to main's

57db395

Black

258b31d

Remove job from wrong location

528bb29

It has nothing to do with returnn-hdf

Add job to correct location, add PlotViterbiAlignmentJob

dd5bec0

Fixes

8217e5d

More fixes

597dfa4

Fix when alignment is empty

464ebc8

Icemole changed the title ~~Job to dump text/alignment pairs~~ Add jobs to view alignments Nov 19, 2024

Icemole requested a review from SimBe195 November 19, 2024 15:51

Icemole changed the title ~~Add jobs to view alignments~~ Add jobs to study alignments Nov 19, 2024

DanEnergetics self-requested a review November 19, 2024 15:52

Icemole added 2 commits November 19, 2024 15:56

Black

cbdab57

Add file for faulty/empty alignment seqtags

b61e063

DanEnergetics reviewed Nov 19, 2024

View reviewed changes

mm/alignment.py Outdated Show resolved Hide resolved

mm/alignment.py Outdated Show resolved Hide resolved

mm/alignment.py Outdated Show resolved Hide resolved

mm/alignment.py Outdated Show resolved Hide resolved

mm/alignment.py Outdated Show resolved Hide resolved

NeoLegends requested a review from Marvin84 November 19, 2024 17:23

Icemole added 2 commits November 20, 2024 09:13

Work

b26a326

General: abstracted function to get alignment data to a public function, reformatted code, added docstrings. PlotViterbiAlignmentJob: added title in plot with segment text.

Remove original author from docstring

15df7fe

Icemole added 3 commits November 20, 2024 09:21

PlotViterbiAlignmentJob: add functionality to plot subset of seq tags

4a703f7

DumpSegmentTextAlignmentJob: always compress output csv

cc07c0c

DumpSegmentTextAlignmentJob: add functionality to plot subset of seq …

dbc4b09

…tags

Icemole requested a review from DanEnergetics November 20, 2024 09:33

michelwi reviewed Nov 20, 2024

View reviewed changes

mm/alignment.py Outdated Show resolved Hide resolved

mm/alignment.py Outdated Show resolved Hide resolved

mm/alignment.py Outdated Show resolved Hide resolved

Icemole and others added 3 commits November 20, 2024 10:54

More work

3c85bdd

General: job fixes. DumpText...: updated output format (now more verbose).

Fix uopen call

6ed4814

Don't interpolate plot

d155c35

Co-authored-by: michelwi <[email protected]>

Icemole requested a review from michelwi November 20, 2024 11:33

alignment_files -> alignment_caches

7bb0009

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add jobs to study alignments #558

Add jobs to study alignments #558

Icemole commented Nov 19, 2024 •

edited

Loading

DanEnergetics commented Nov 19, 2024

Icemole commented Nov 20, 2024 •

edited

Loading

Add jobs to study alignments #558

Are you sure you want to change the base?

Add jobs to study alignments #558

Conversation

Icemole commented Nov 19, 2024 • edited Loading

DanEnergetics commented Nov 19, 2024

Icemole commented Nov 20, 2024 • edited Loading

Icemole commented Nov 19, 2024 •

edited

Loading

Icemole commented Nov 20, 2024 •

edited

Loading