Draft of RNN-T decoding method #905

danpovey · 2022-01-22T06:01:43Z

Guys (especially @pkufool),

This is a draft of the core parts of the RNN-T decoding method. It supports streams having different
graphs, and aggregation and disaggregation of streams (to cope with asynchronous input).
As we discussed, I am limiting it to max_sym_per_frame=1, which substantially simplifies the
decoder.

This code is far from being able to compile or run, but all the nontrivial parts are drafted so I
am reasonably confident that there is nothing major missing. It will need the Unstack() function.
Please notice that I have slightly changed (simplified) the extended interface of SubsampleRaggedShape(), versus
#900, to optionally output a new2old array and not a Renumbering object.

The code (interface drafted) in array_of_ragged.h is some general-purpose utility code that can be used
in a bunch of low-level things; it substantially simplifies the interfaces of this drafted code, so I thought
it was worth adding. Much of its functionality is actually not needed for this PR; it would be OK to just
write the needed parts and leave the rest as TODOs.

There would also be some thinking needed, to decide how to write the Python interfaces. I hope
that @csukuangfj might be able to contribute here.

The overall vision is to be able to create RNN-T acoustic models that can be decoded in real-time with very
high concurrency (maybe hundreds of streams). This would probably require a model topology that
is memory-efficient for decoding, e.g. replacing transformer encoder with LSTM encoder (I hope that
some of the work we are separately doing with teacher-student ideas might make it possible to
train the LSTM as a student with a better-generalizing transformer as teacher).

In order to decode without a graph, I propose just creating a "trivial" graph with one state with a self-loop for
each symbol. I don't think this will cause a substantial slow-down because the work done is very tiny compared
with the model forward().

I am hoping that you guys will be able to do most of the work from this point.

pkufool · 2022-01-22T13:56:38Z

@danpovey Did you miss some commits, I don't see any difference from #900 .

danpovey · 2022-01-22T14:26:25Z

Fixed.

pkufool · 2022-03-16T02:19:28Z

closed via #926

danpovey added 2 commits January 12, 2022 17:21

Extend interface of SubsampleRagged.

6afb7e7

Add interface for pruning ragged tensor.

7341450

Draft of new RNN-T decoding method

280e2c2

danpovey added 3 commits February 22, 2022 17:28

Add previously missing files

31cdd73

Some cleanup

a5b974b

Slight cleanup

87b3466

pkufool mentioned this pull request Feb 24, 2022

Extend interface of SubsampleRagged. #900

Closed

pkufool mentioned this pull request Mar 6, 2022

Implement Rnnt decoding #926

Merged

pkufool closed this Mar 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft of RNN-T decoding method #905

Draft of RNN-T decoding method #905

danpovey commented Jan 22, 2022

pkufool commented Jan 22, 2022

danpovey commented Jan 22, 2022

pkufool commented Mar 16, 2022

Draft of RNN-T decoding method #905

Draft of RNN-T decoding method #905

Conversation

danpovey commented Jan 22, 2022

pkufool commented Jan 22, 2022

danpovey commented Jan 22, 2022

pkufool commented Mar 16, 2022