WIP: Add timestamps for streaming ASR #119

csukuangfj · 2022-09-19T05:09:28Z

Use the model from k2-fsa/icefall#558 for testing.

git lfs install
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03

Start the server

export CUDA_VISIBLE_DEVICES=0

nn_encoder_filename=./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-iter-468000-avg-16.pt
nn_decoder_filename=./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-iter-468000-avg-16.pt
nn_joiner_filename=./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-iter-468000-avg-16.pt

bpe_model_filename=./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/data/lang_bpe_500/bpe.model

./sherpa/bin/lstm_transducer_stateless/streaming_server.py \
  --endpoint.rule1.must-contain-nonsilence=false \
  --endpoint.rule1.min-trailing-silence=5.0 \
  --endpoint.rule2.min-trailing-silence=2.0 \
  --endpoint.rule3.min-utterance-length=50.0 \
  --port 6006 \
  --decoding-method greedy_search \
  --max-batch-size 50 \
  --max-wait-ms 5 \
  --nn-pool-size 1 \
  --max-active-connections 10 \
  --nn-encoder-filename $nn_encoder_filename \
  --nn-decoder-filename $nn_decoder_filename \
  --nn-joiner-filename $nn_joiner_filename \
  --bpe-model-filename $bpe_model_filename

Start the client

wave=./test_wavs/1089-134686-0001.wav
wave=./test_wavs/1221-135766-0002.wav

./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_client.py  \
  --server-port 6006 \
  $wave

Output from the client:

2022-09-19 13:08:23,757 INFO [streaming_client.py:93] Final result of segment 0: YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
2022-09-19 13:08:23,758 INFO [streaming_client.py:142] ./test_wavs/1221-135766-0002.wav
segment: 0
text: YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
timestamps: [0.56, 0.76, 1.04, 1.12, 1.36, 1.52, 1.76, 1.84, 1.8800000000000001, 
1.92, 2.0, 2.12, 2.24, 2.36, 2.56, 2.6, 2.72, 2.7600000000000002, 2.96, 3.0, 3.24, 
3.4, 3.44, 3.88, 4.2, 4.28, 4.32, 4.4, 4.48, 4.5600000000000005, 4.6000000000000005]
(token, time): [('_YE', 0.56), ('T', 0.76), ('_THE', 1.04), ('SE', 1.12), ('_THOUGHT', 1.36), ('S', 1.52),
 ('_A', 1.76), ('FF', 1.84), ('E', 1.8800000000000001), ('C', 1.92), ('TED', 2.0), ('_HE', 2.12), ('S', 2.24), 
('TER', 2.36), ('_P', 2.56), ('RY', 2.6), ('N', 2.72), ('NE', 2.7600000000000002), ('_', 2.96), ('LESS', 3.0), 
('_WITH', 3.24), ('_HO', 3.4), ('PE', 3.44), ('_THAN', 3.88), ('_A', 4.2), ('PP', 4.28), ('RE', 4.32), ('HE', 4.4), 
('N', 4.48), ('S', 4.5600000000000005), ('ION', 4.6000000000000005)]

csukuangfj · 2022-09-19T05:20:40Z

Comparing the alignment with https://github.com/CorentinJ/librispeech-alignments

word	CorentinJ/librispeech-alignments	greedy_search	delay
AFTER	0.36	0.56	0.56 - 0.36 = 0.20
EARLY	0.73	1.16	0.43
NIGHTFALL	1.04	1.60	0.56
THE	1.77	2.16	0.39
YELLOW	1.90	2.32	0.42
LAMPS	2.16	2.68	0.52
WOULD	2.59	3.12	0.53
LIGHT	2.76	3.28	0.52
UP	3.07	3.52	0.45
HERE	3.27	3.76	0.49
AND	3.52	3.96	0.44
THERE	3.66	4.24	0.58
THE	4.09	4.56	0.47
SQUALID	4.21	4.76	0.55
QUARTER	4.78	5.28	0.50
OF	5.31	5.72	0.41
THE	5.42	5.84	0.42
BROTHELS	5.50	6.00	0.50
silence	6.16-6.625	N/A	N/A

csukuangfj · 2022-09-19T05:27:05Z

A second comparison using a different utterance:

word	CorentinJ/librispeech-alignments	greedy_search	delay
YET	0.42	0.56	0.56 - 0.42 = 0.14
THESE	0.65	1.04	0.39
THOUGHTS	0.93	1.36	0.43
AFFECTED	1.26	1.76	0.50
HESTER	1.66	2.12	0.46
PRYNNE	2.02	2.56	0.54
LESS	2.46	2.96	0.50
WITH	2.83	3.24	0.41
HOPE	3.03	3.40	0.37
silence	3.48	N/A	N/A
THAN	3.55	3.88	0.33
APPREHENSION	3.76	4.20	0.44
silence	4.56	N/A	N/A

csukuangfj · 2022-09-19T05:30:36Z

Different from #52, the encoder model in this PR uses LSTM instead of Conformer.

Also, the first token is no longer emitted on the first frame.

danpovey · 2022-09-19T07:43:26Z

Cool!!
It might be nice at some point to have a way of computing average delays, as would be experienced by the user.
[e.g.. between the times printed in our alignment, and the time it was output.] That way, if we compute the delay from the reference alignment to our alignment, we can add the delay due to the latency of the algorithm to find the total delay.

ezerhouni · 2022-09-30T05:50:22Z

@csukuangfj What is missing in this PR ?

csukuangfj · 2022-09-30T05:57:47Z

I think I only made changes to lstm_transducer_stateless.

Other folders for streaming models have not been updated yet.

ezerhouni · 2022-09-30T06:03:20Z

Ok ! Let me try to take care of it today

csukuangfj · 2022-09-30T06:12:34Z

You can use the changes from this PR. I am closing it now.

Thanks again!

ezerhouni · 2022-09-30T06:28:47Z

@csukuangfj You mean I create my own branch with your changes right ?

csukuangfj · 2022-09-30T06:39:32Z

@csukuangfj You mean I create my own branch with your changes right ?

Yes, you can use any approach that you think work the best.

WIP: Add timestamps for streaming ASR

e788a6b

Add timestamps for streaming modified beam search

07c4faf

Add timestamps for streaming fast_beam_search

c593acd

csukuangfj mentioned this pull request Sep 29, 2022

Add streaming modified beam search #142

Merged

csukuangfj closed this Sep 30, 2022

ezerhouni mentioned this pull request Sep 30, 2022

Streaming asr time stamp #146

Merged

yaozengwei mentioned this pull request Oct 3, 2022

Get timestamps during decoding k2-fsa/icefall#598

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add timestamps for streaming ASR #119

WIP: Add timestamps for streaming ASR #119

csukuangfj commented Sep 19, 2022 •

edited

Loading

csukuangfj commented Sep 19, 2022

csukuangfj commented Sep 19, 2022

csukuangfj commented Sep 19, 2022

danpovey commented Sep 19, 2022

ezerhouni commented Sep 30, 2022

csukuangfj commented Sep 30, 2022

ezerhouni commented Sep 30, 2022

csukuangfj commented Sep 30, 2022

ezerhouni commented Sep 30, 2022

csukuangfj commented Sep 30, 2022

WIP: Add timestamps for streaming ASR #119

WIP: Add timestamps for streaming ASR #119

Conversation

csukuangfj commented Sep 19, 2022 • edited Loading

Start the server

Start the client

Output from the client:

csukuangfj commented Sep 19, 2022

csukuangfj commented Sep 19, 2022

csukuangfj commented Sep 19, 2022

danpovey commented Sep 19, 2022

ezerhouni commented Sep 30, 2022

csukuangfj commented Sep 30, 2022

ezerhouni commented Sep 30, 2022

csukuangfj commented Sep 30, 2022

ezerhouni commented Sep 30, 2022

csukuangfj commented Sep 30, 2022

csukuangfj commented Sep 19, 2022 •

edited

Loading