Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test #15

Open
wants to merge 77 commits into
base: master
Choose a base branch
from
Open

test #15

wants to merge 77 commits into from

Conversation

csukuangfj
Copy link
Owner

No description provided.

SarahSmitho and others added 30 commits June 7, 2023 11:17
* utils: add symlink_or_copyfile

* pruned_transducer_stateless7: use symlinks (when possible) to output best epochs

* Rename function

---------

Co-authored-by: Yifan Yang <[email protected]>
* add CTC loss option in zipformer recipe

* add ctc_decode.py

* support CTC model export, add jit_pretrained_ctc.py, pretrained_ctc.py

* update README.md and RESULTS.md

* add CI test
* Update model.py

* Update train.py

* Update decoder.py
* CTC loss return tensor

* Update model.py
* copy files

* update train.py

* small fixes

* Add decode.py

* Fix dataloader in decode.py

* add blank penalty

* Add blank-penalty to other decoding method

* Minor fixes

* add zipformer2 recipe

* Minor fixes

* Remove pruned7

* export and test models

* Replace bpe with tokens in export.py and pretrain.py

* Minor fixes

* Minor fixes

* Minor fixes

* Fix export

* Update results

* Fix zipformer-ctc

* Fix ci

* Fix ci

* Fix CI

* Fix CI

---------

Co-authored-by: Fangjun Kuang <[email protected]>
* initial commit for zipformer tedlium

* fix unk decoding

* add pretrained model and logs

* update for new AsrModel

* add option for choosing rnnt type

* add results with modified rnnt
* support testing onnx exported model on the test sets

* use token_table instead
* Add start-batch option for RNNLM training

* Also set epoch

* Skip batches on load
* merge upstream

* add SURT model and training

* add libricss decoding

* add chunk width randomization

* decode SURT with libricss

* initial commit for zipformer_ctc

* remove unwanted changes

* remove changes to other recipe

* fix zipformer softlink

* fix for JIT export

* add missing file

* fix symbolic links

* update results

* clean commit for SURT recipe

* training libricss surt model

* remove unwanted files

* remove unwanted changes

* remove changes in librispeech

* change some files to symlinks

* remove unwanted changes in utils

* add export script

* add README

* minor fix in README

* add assets for README

* replace some files with symlinks

* remove unused decoding methods

* fix symlink

* address comments from @csukuangfj
* add shallow fusion documentation

* add documentation for LODR

* upload docs for LM rescoring
* Fix for ci

* Fix frame_reducer
* merge upstream

* add SURT model and training

* add libricss decoding

* add chunk width randomization

* decode SURT with libricss

* initial commit for zipformer_ctc

* remove unwanted changes

* remove changes to other recipe

* fix zipformer softlink

* fix for JIT export

* add missing file

* fix symbolic links

* update results

* clean commit for SURT recipe

* training libricss surt model

* remove unwanted files

* remove unwanted changes

* remove changes in librispeech

* change some files to symlinks

* remove unwanted changes in utils

* add export script

* add README

* minor fix in README

* add assets for README

* replace some files with symlinks

* remove unused decoding methods

* initial commit for SURT AMI recipe

* fix symlink

* add train + decode scripts

* add missing symlink

* change files to symlink

* change file type
JinZr and others added 25 commits September 4, 2023 17:56
* Init commit for recipes trained on multiple zh datasets.

* fbank extraction for thchs30

* added support for aishell1

* added support for aishell-2

* fixes

* fixes

* fixes

* added support for stcmds and primewords

* fixes

* added support for magicdata

script for fbank computation not done yet

* added script for magicdata fbank computation

* file permission fixed

* updated for the wenetspeech recipe

* updated

* Update preprocess_kespeech.py

* updated

* updated

* updated

* updated

* file permission fixed

* updated paths

* fixes

* added support for kespeech dev/test set fbank computation

* fixes for file permission

* refined support for KeSpeech

* added scripts for BPE model training

* updated

* init commit for the multi_zh-cn zipformer recipe

* disable speed perturbation by default

* updated

* updated

* added necessary files for the zipformer recipe

* removed redundant wenetspeech M and S sets

* updates for multi dataset decoding

* refined

* formatting issues fixed

* updated

* minor fixes

* this commit finalize the recipe (hopefully)

* fixed formatting issues

* minor fixes

* updated

* using soft links to reduce redundancy

* minor updates

* using soft links to reduce redundancy

* minor updates

* minor updates

* using soft links to reduce redundancy

* minor updates

* Update README.md

* minor updates

* Update egs/multi_zh-hans/ASR/local/compute_fbank_magicdata.py

Co-authored-by: Fangjun Kuang <[email protected]>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_magicdata.py

Co-authored-by: Fangjun Kuang <[email protected]>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_stcmds.py

Co-authored-by: Fangjun Kuang <[email protected]>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_stcmds.py

Co-authored-by: Fangjun Kuang <[email protected]>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_primewords.py

Co-authored-by: Fangjun Kuang <[email protected]>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_primewords.py

Co-authored-by: Fangjun Kuang <[email protected]>

* minor updates

* minor fixes

* fixed a formatting issue

* Update preprocess_kespeech.py

* Update prepare.sh

* Update egs/multi_zh-hans/ASR/local/compute_fbank_kespeech_splits.py

Co-authored-by: Fangjun Kuang <[email protected]>

* Update egs/multi_zh-hans/ASR/local/preprocess_kespeech.py

Co-authored-by: Fangjun Kuang <[email protected]>

* removed redundant files

* symlinks added

* minor updates

* added CI tests for `multi_zh-hans`

* minor fixes

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

---------

Co-authored-by: Fangjun Kuang <[email protected]>
* Update conformer.py
* Update zipformer.py

fix bug in get_dynamic_dropout_rate
* Use torch.jit.script() to export the decoder model

See also k2-fsa/sherpa-onnx#327
…#1269)

* fixes for `diagnostics`

Replace `2 ** 22` with `512` as the default value of `diagnostics.TensorDiagnosticOptions`

also black formatted some scripts

* fixed formatting issues
* formatted the entire librispeech recipe

* minor updates
* add documentation for training an RNNLM
@csukuangfj csukuangfj added the ctc label Sep 27, 2023
@csukuangfj csukuangfj added ctc and removed ctc labels Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.