Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GigaSpeech RNN-T experiments #318

Merged
merged 12 commits into from
May 13, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ per-file-ignores =
egs/tedlium3/ASR/*/conformer.py: E501,
egs/gigaspeech/ASR/*/conformer.py: E501,
egs/librispeech/ASR/pruned_transducer_stateless2/*.py: E501,
egs/gigaspeech/ASR/pruned_transducer_stateless2/*.py: E501,
egs/librispeech/ASR/pruned_transducer_stateless4/*.py: E501,
egs/librispeech/ASR/*/optim.py: E501,
egs/librispeech/ASR/*/scaling.py: E501,
Expand Down
7 changes: 4 additions & 3 deletions egs/gigaspeech/ASR/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@ ln -sfv /path/to/GigaSpeech download/GigaSpeech
```

## Performance Record
| | Dev | Test |
|-----|-------|-------|
| WER | 10.47 | 10.58 |
| | Dev | Test |
|--------------------------------|-------|-------|
| `conformer_ctc` | 10.47 | 10.58 |
| `pruned_transducer_stateless2` | 10.52 | 10.62 |

See [RESULTS](/egs/gigaspeech/ASR/RESULTS.md) for details.
80 changes: 77 additions & 3 deletions egs/gigaspeech/ASR/RESULTS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,78 @@
## Results
### GigaSpeech BPE training results (Pruned Transducer 2)

#### 2022-05-12

#### Conformer encoder + embedding decoder

Conformer encoder + non-recurrent decoder. The encoder is a
reworked version of the conformer encoder, with many changes. The
decoder contains only an embedding layer, a Conv1d (with kernel
size 2) and a linear layer (to transform tensor dim). k2 pruned
RNN-T loss is used.

Results are:

| | Dev | Test |
|----------------------|-------|-------|
| greedy search | 10.59 | 10.87 |
| fast beam search | 10.56 | 10.80 |
| modified beam search | 10.52 | 10.62 |

To reproduce the above result, use the following commands for training:

```bash
cd egs/gigaspeech/ASR
./prepare.sh
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
./pruned_transducer_stateless2/train.py \
--max-duration 120 \
--num-workers 1 \
--world-size 8 \
--exp-dir pruned_transducer_stateless2/exp \
--bpe-model data/lang_bpe_500/bpe.model \
--use-fp16 True
```

and the following commands for decoding:

```bash
# greedy search
./pruned_transducer_stateless2/decode.py \
--epoch 29 \
--avg 11 \
--decoding-method greedy_search \
--exp-dir pruned_transducer_stateless2/exp \
--bpe-model data/lang_bpe_500/bpe.model \
--max-duration 20 \
--num-workers 1

# fast beam search
./pruned_transducer_stateless2/decode.py \
--epoch 29 \
--avg 9 \
--decoding-method fast_beam_search \
--exp-dir pruned_transducer_stateless2/exp \
--bpe-model data/lang_bpe_500/bpe.model \
--max-duration 20 \
--num-workers 1

# modified beam search
./pruned_transducer_stateless2/decode.py \
--epoch 29 \
--avg 8 \
--decoding-method modified_beam_search \
--exp-dir pruned_transducer_stateless2/exp \
--bpe-model data/lang_bpe_500/bpe.model \
--max-duration 20 \
--num-workers 1
```

Pretrained model is available at
<https://huggingface.co/wgb14/icefall-asr-gigaspeech-pruned-transducer-stateless2>

The tensorboard log for training is available at
<https://tensorboard.dev/experiment/zmmM0MLASnG1N2RmJ4MZBw/>

### GigaSpeech BPE training results (Conformer-CTC)

Expand All @@ -20,7 +94,7 @@ Scale values used in n-gram LM rescoring and attention rescoring for the best WE

To reproduce the above result, use the following commands for training:

```
```bash
cd egs/gigaspeech/ASR
./prepare.sh
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
Expand All @@ -34,7 +108,7 @@ export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"

and the following command for decoding:

```
```bash
./conformer_ctc/decode.py \
--epoch 18 \
--avg 6 \
Expand All @@ -59,7 +133,7 @@ Scale values used in n-gram LM rescoring and attention rescoring for the best WE

To reproduce the above result, use the training commands above, and the following command for decoding:

```
```bash
./conformer_ctc/decode.py \
--epoch 18 \
--avg 6 \
Expand Down
Empty file.
Loading