[Efficient Conformer] Support ONNX GPU export, add librispeech results, and fix V2 streaming decode issue #1701

zwglory · 2023-02-21T07:45:00Z

Support ONNX GPU export.

After exporting onnx, the loss of CER is small. Using the train_u2++_efficonformer_v1_stream.yaml with causal:true, the CER increases from 9.30% to 9.33% in our dataset.

Add librispeech results and conf:

train_u2++_efficonformer_v1.yaml
train_u2++_efficonformer_v2.yaml

Add streaming conf in aishell:

train_u2++_efficonformer_v1_stream.yaml

Fix bug of V2 streaming decode:

Fix issue Streaming Efficient Conformer isn't working #1690 (comment)

…e changes. Completed the casual and non-casual convolution model tests for the EfficientConformer, as well as JIT runtime tests. Modified yaml files for Aishell-1

…, and fix bug of V2 streaming decode

xingchensong · 2023-02-22T04:51:05Z

wenet/efficient_conformer/encoder.py

        for i, layer in enumerate(self.encoders):
            factor = self.calculate_downsampling_factor(i)
            # NOTE(xcsong): Before layer.forward
            #   shape(att_cache[i:i + 1]) is (1, head, cache_t1, d_k * 2),
            #   shape(cnn_cache[i])       is (b=1, hidden-dim, cache_t2)
            # shape(new_att_cache) = [ batch, head, time2, outdim//head * 2 ]
+            att_cache_trunc = 0
+            if xs.size(1) + att_cache.size(2) / factor > pos_emb.size(1):
+                # The time step is not divisible by the downsampling multiple
+                # We propose to double the chunk_size.
+                att_cache_trunc = xs.size(1) + \
+                    att_cache.size(2) // factor - pos_emb.size(1) + 1
            xs, _, new_att_cache, new_cnn_cache = layer(
                xs, att_mask, pos_emb,
                mask_pad=mask_pad,
-                att_cache=att_cache[i:i + 1, :, ::factor, :],
+                att_cache=att_cache[i:i + 1, :, ::factor, :][:, :, att_cache_trunc:, :],
                cnn_cache=cnn_cache[i, :, :, :]
                if cnn_cache.size(0) > 0 else cnn_cache
            )


Q1:

xs.size(1) + att_cache.size(2) / factor > pos_emb.size(1)

OR

( xs.size(1) + att_cache.size(2) ) / factor > pos_emb.size(1)

?
results of the above two lines are not equal :

Q2:

What do you mean by double the chunk_size ? I think [:, :, att_cache_trunc:, :] will simply eliminate any unnecessary attention cache at the beginning, so where is the double ?

Q1: xs.size(1) + att_cache.size(2) / factor > pos_emb.size(1), because xs was downsampled in the previous block.

Q2: For the train_u2++_efficonformer_v2.yaml, downsample rate is 1/2 (conv2d2) + 1/4 (efficonformer block), so 18->36 can reduce the downsampling loss.

The description of double is really inaccurate here, so let me adjust it.

xingchensong · 2023-02-22T04:54:05Z

examples/librispeech/s0/README.md

+## Efficient Conformer V1 Result
+
+* Feature info: using fbank feature, cmvn, speed perturb, dither
+* Training info: train_u2++_efficonformer_v1.yaml, 8 gpu
+* Decoding info: ctc_weight 0.5, reverse_weight 0.3, average_num 20
+
+test clean
+
+| decoding mode          | full | 18   | 16   |
+|------------------------|------|------|------|
+| attention decoder      | 3.65 | 3.88 | 3.87 |
+| ctc_greedy_search      | 3.46 | 3.79 | 3.77 |
+| ctc prefix beam search | 3.44 | 3.75 | 3.74 |
+| attention rescoring    | 3.17 | 3.44 | 3.41 |
+
+test other
+
+| decoding mode          | full | 18    | 16    |
+|------------------------|------|-------|-------|
+| attention decoder      | 8.51 | 9.24  | 9.25  |
+| ctc_greedy_search      | 8.94 | 10.04 | 10.06 |
+| ctc prefix beam search | 8.91 | 10    | 10.01 |
+| attention rescoring    | 8.21 | 9.25  | 9.25  |


Thx, the results are much better than the standard conformer! I would suggest adding model params in README for a clearer comparison.

OK, no problem

xingchensong · 2023-02-22T04:59:32Z

wenet/efficient_conformer/encoder.py

+        if self.global_chunk_size > 0:
+            # for ONNX decode simulation， padding xs to chunk_size
+            real_len = xs.size(1)
+            pad_len = self.chunk_feature_map - real_len
+            xs = F.pad(xs, (0, 0, 0, pad_len), value=0.0)
+            chunk_masks = F.pad(chunk_masks, (0, pad_len), value=0.0)
+


Out of curiosity, will padding only be applied to the last chunk, given that previous chunks always have a valid chunk size?

Yes, padding will only be applied to the last chunk. Also, this part is currently only valid if you manually specify use_onnx=True, to simulate the CER after exporting ONNX.

xingchensong · 2023-02-22T06:18:23Z

Many thx!

zwglory and others added 9 commits December 26, 2022 15:42

add Efficient Conformer implementation

f72cecd

fix trailing whitespace, formatting and semantic

7719cec

Ensures consistency of forward_chunk interface and deletes all runtim…

c2e5479

…e changes. Completed the casual and non-casual convolution model tests for the EfficientConformer, as well as JIT runtime tests. Modified yaml files for Aishell-1

Merge branch 'wenet-e2e:main' into main

48331bf

[EfficientConformer] add Aishell-1 Results

77553d6

Merge branch 'wenet-e2e:main' into main

e78ea0b

Merge branch 'wenet-e2e:main' into main

d0297f2

[EfficientConformer] support ONNX GPU export, add librispeech results…

1b9554a

…, and fix bug of V2 streaming decode

Merge branch 'wenet-e2e:main' into main

ad7529a

xingchensong reviewed Feb 22, 2023

View reviewed changes

zwglory added 2 commits February 22, 2023 14:07

[Efficient Conformer] add model params in README.

8b12bd9

fix trailing whitespace

39d2c09

xingchensong approved these changes Feb 22, 2023

View reviewed changes

xingchensong merged commit 9a7d947 into wenet-e2e:main Feb 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Efficient Conformer] Support ONNX GPU export, add librispeech results, and fix V2 streaming decode issue #1701

[Efficient Conformer] Support ONNX GPU export, add librispeech results, and fix V2 streaming decode issue #1701

zwglory commented Feb 21, 2023

xingchensong Feb 22, 2023 •

edited

Loading

zwglory Feb 22, 2023

xingchensong Feb 22, 2023

zwglory Feb 22, 2023

xingchensong Feb 22, 2023

zwglory Feb 22, 2023

xingchensong commented Feb 22, 2023

[Efficient Conformer] Support ONNX GPU export, add librispeech results, and fix V2 streaming decode issue #1701

[Efficient Conformer] Support ONNX GPU export, add librispeech results, and fix V2 streaming decode issue #1701

Conversation

zwglory commented Feb 21, 2023

xingchensong Feb 22, 2023 • edited Loading

Choose a reason for hiding this comment

zwglory Feb 22, 2023

Choose a reason for hiding this comment

xingchensong Feb 22, 2023

Choose a reason for hiding this comment

zwglory Feb 22, 2023

Choose a reason for hiding this comment

xingchensong Feb 22, 2023

Choose a reason for hiding this comment

zwglory Feb 22, 2023

Choose a reason for hiding this comment

xingchensong commented Feb 22, 2023

xingchensong Feb 22, 2023 •

edited

Loading