-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to train or optimize the sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01 model for my own voice? #1371
Comments
Please see our icefall doc. |
I found some jupyter notebooks in this: https://github.com/k2-fsa/colab/tree/master/sherpa-onnx And this https://k2-fsa.github.io/icefall/recipes/Finetune/from_supervised/finetune_zipformer.html |
Please see this comment. You need to spend some time reading our doc |
After reading https://k2-fsa.github.io/icefall/recipes/Non-streaming-ASR/yesno/tdnn.html#colab-notebook But I can't found a ipynb file for wenetspeech-kws recipe, so I try to modify the my-icefall-yes-no-dataset-recipe.ipynb for wenetspeech-kws(https://github.com/diyism/colab_kaldi2/blob/main/my_icefall_wenetspeech_kws_dataset_recipe.ipynb), but I found it's downloading 500GB dataset files, I think it won't work in colab: I want to build a web UI to record my own voice of mandarin syllables to replace wenetspeech-kws dataset(without downloading the 500GB files) just like the YonaVox project(https://github.com/diyism/YonaVox/blob/master/training/recorder.ipynb), and to train the kws model only with these recorded voice, is it feasible? I found another 2 ipynb files for creating recipes, but it seems that it's not specifically about creating voice dataset files for wenetspeech-kws: |
Are there any differences between the dataset you want to build with other dataset examples in icefall, e.g., the yesno dataset? The principle is the same. |
I have some wav files of my own voice and corresponding transcription txt files. Then I wrote create_dataset.py (https://github.com/diyism/colab_kaldi2/blob/main/create_dataset.py) which can successfully generate a my_dataset.jsonl file. Now I want to replace my_dataset.jsonl into egs/wenetspeech/KWS/prepare.sh, but I found that this prepare.sh is much more complex than the one for Yesno. It also calls egs/wenetspeech/ASR/prepare.sh, and ASR/prepare.sh is also very complex, containing 23 stages, and it requires more than just the my_dataset.jsonl file. I'm completely lost and don't know where to start. Is it feasible to train a KWS model using only my voice files and my_dataset.jsonl? |
(a) wav.scp
(b) wav.scp It should contain something like below
(c) utt2spk
after following the doc
Again, I suggest that you spend time, maybe several days, reading our existing examples. |
I've modified my_icefall_wenetspeech_asr_dataset_recipe.ipynb(https://github.com/diyism/colab_kaldi2/blob/main/my_icefall_wenetspeech_asr_dataset_recipe.ipynb),
And I've modified icefall_egs_wenetspeech_ASR_prepare.sh(https://github.com/diyism/colab_kaldi2/blob/main/icefall_egs_wenetspeech_ASR_prepare.sh) and icefall_egs_wenetspeech_ASR_local_preprocess_wenetspeech.py(https://github.com/diyism/colab_kaldi2/blob/main/icefall_egs_wenetspeech_ASR_local_preprocess_wenetspeech.py):
It seems it works:
But if I run the 5th step:
It shows errors:
I guess I lost something. |
I've tested the latest kws-model(sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2 from https://github.com/k2-fsa/sherpa-onnx/releases/tag/kws-models) against my own voice,
but both of the 2 models(encoder-epoch-99-avg-1, encoder-epoch-12-avg-2) in it recognized my "bo2" as "guo2" wrongly:
Is there any way to train the sherpa-onnx-kws model for my own voice?
for example, as easy as the YonaVox project:
record my every mono-syllable(pinyin) for 50 times on my phone chrome browser, and with 1 second mute between every syllable automatically (https://github.com/diyism/YonaVox/blob/master/training/recorder.ipynb):
training or optimizing the model with google colab GPU(https://github.com/diyism/YonaVox/blob/master/training/Hebrew_AC_voice_activation_(public_version).ipynb)
ref: #920
The text was updated successfully, but these errors were encountered: