Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FYI: Download links about Android APKs for piper models #257

Open
csukuangfj opened this issue Oct 29, 2023 · 52 comments
Open

FYI: Download links about Android APKs for piper models #257

csukuangfj opened this issue Oct 29, 2023 · 52 comments

Comments

@csukuangfj
Copy link

csukuangfj commented Oct 29, 2023

Now you can try piper models on your Android phones.

The following languages are supported:

  • English
  • French
  • Spanish
  • German

Community help is appreciated to convert more models from piper to sherpa-onnx.

Please see
https://k2-fsa.github.io/sherpa/onnx/tts/apk.html

Screenshot 2023-10-29 at 17 48 27

Note:

You can try the models in your browser by visiting the following huggingface space
https://huggingface.co/spaces/k2-fsa/text-to-speech

@beqabeqa473
Copy link

@csukuangfj thanks for your work.

To evaluate models it would be great to implement android tts engine api.

I would do it myself after i finish all work-related things, but i am saying in case someone could do it before.

@anita-smith1
Copy link

@csukuangfj Sorry for my noob question. When I convert a fine-tuned coqui tts vits-ljs model to onnx using the code here, i only got the onnx export. The android code expects tokens.txt and lexicon. How can i get those too? and will this work with coqui tts models?

@csukuangfj
Copy link
Author

@anita-smith1

Can you find the code about how to convert a word to phonemes for vits models from coqui?

If you can provide that, I can provide scripts to generate lexicon.txt and tokens.txt and also model.onnx from coqui.

@csukuangfj
Copy link
Author

@anita-smith1

I just managed to convert vits models from coqui to sherpa-onnx.

Will post a colab notebook to show you how to do that.

You can use the converted model in sherpa-onnx, e.g., build an Android App with sherpa-onnx.

@csukuangfj
Copy link
Author

@anita-smith1

@nanaghartey

I just created a colab notebook to show how to export the VITS models from
https://github.com/coqui-ai/TTS
to onnx
and how to generate tokens.txt and lexicon.txt so that you can use the exported
model with sherpa-onnx.

Please see
https://colab.research.google.com/drive/1cI9VzlimS51uAw4uCR-OBeSXRPBc4KoK?usp=sharing

@csukuangfj
Copy link
Author

For those of you who are interested in converting piper models to sherpa-onnx, please have a look at the following
colab notebook:

https://colab.research.google.com/drive/1PScLJV3sbUUAOiptLO7Ixlzh9XnWWoYZ?usp=sharing

@anita-smith1
Copy link

anita-smith1 commented Nov 10, 2023

@csukuangfj Thanks for sharing how to export. I run your colab notebook without using my model. Everything is generated but when i use it in the sherpa-onnx tts android demo app, it crashes. The app loads the model, lexicon and tokens all right but when you tap "Generate" the app crashes with:

2023-11-10 17:39:10.825 18430-18430 sherpa-onnx             com.k2fsa.sherpa.onnx                W  string is: how are you doing
2023-11-10 17:39:10.825 18430-18430 sherpa-onnx             com.k2fsa.sherpa.onnx                W  Raw text: how are you doing
2023-11-10 17:39:10.826 18430-18430 libc                    com.k2fsa.sherpa.onnx                A  Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x38333501 in tid 18430 (fsa.sherpa.onnx), pid 18430 (fsa.sherpa.onnx)
2023-11-10 17:39:11.148 18526-18526 DEBUG                   pid-18526                            A  Cmdline: com.k2fsa.sherpa.onnx
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A  pid: 18430, tid: 18430, name: fsa.sherpa.onnx  >>> com.k2fsa.sherpa.onnx <<<
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #00 pc 00000000003d65f8  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libonnxruntime.so (BuildId: 5bc662f139575789b099c8c4823a8ed1565a0d4c)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #01 pc 00000000003a9404  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libonnxruntime.so (BuildId: 5bc662f139575789b099c8c4823a8ed1565a0d4c)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #02 pc 000000000009f914  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libsherpa-onnx-core.so (Ort::detail::SessionImpl<OrtSession>::Run(Ort::RunOptions const&, char const* const*, Ort::Value const*, unsigned long, char const* const*, unsigned long)+204) (BuildId: c9b089160ce1cb938a4794e9c94d3a1636656712)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #03 pc 0000000000124ba4  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libsherpa-onnx-core.so (sherpa_onnx::OfflineTtsVitsModel::Impl::RunVits(Ort::Value, long, float)+1128) (BuildId: c9b089160ce1cb938a4794e9c94d3a1636656712)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #04 pc 0000000000123f9c  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libsherpa-onnx-core.so (sherpa_onnx::OfflineTtsVitsModel::Impl::Run(Ort::Value, long, float)+96) (BuildId: c9b089160ce1cb938a4794e9c94d3a1636656712)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #05 pc 0000000000123ec0  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libsherpa-onnx-core.so (sherpa_onnx::OfflineTtsVitsModel::Run(Ort::Value, long, float)+48) (BuildId: c9b089160ce1cb938a4794e9c94d3a1636656712)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #06 pc 00000000001220a4  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libsherpa-onnx-core.so (sherpa_onnx::OfflineTtsVitsImpl::Generate(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, long, float) const+1128) (BuildId: c9b089160ce1cb938a4794e9c94d3a1636656712)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #07 pc 000000000000a1a8  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libsherpa-onnx-jni.so (Java_com_k2fsa_sherpa_onnx_OfflineTts_generateImpl+316) (BuildId: 7c35f6abaaa0600b2d69ed7bf0aa62b69c017daa)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #14 pc 0000000000002278  [anon:dalvik-classes3.dex extracted in memory from /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/base.apk!classes3.dex] (com.k2fsa.sherpa.onnx.OfflineTts.generate+0)
2023-11-10 17:39:11.150 18526-18526 DEBUG                   pid-18526                            A        #19 pc 00000000000012a0  [anon:dalvik-classes3.dex extracted in memory from /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/base.apk!classes3.dex] (com.k2fsa.sherpa.onnx.MainActivity.onClickGenerate+0)
2023-11-10 17:39:11.150 18526-18526 DEBUG                   pid-18526                            A        #24 pc 0000000000001544  [anon:dalvik-classes3.dex extracted in memory from /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/base.apk!classes3.dex] (com.k2fsa.sherpa.onnx.MainActivity.onCreate$lambda$0+0)
2023-11-10 17:39:11.150 18526-18526 DEBUG                   pid-18526                            A        #29 pc 0000000000001200  [anon:dalvik-classes3.dex extracted in memory from /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/base.apk!classes3.dex] (com.k2fsa.sherpa.onnx.MainActivity.$r8$lambda$OIkLpaHjEAmudVQGZZp-NNJ9rrA+0)
2023-11-10 17:39:11.150 18526-18526 DEBUG                   pid-18526                            A        #34 pc 00000000000011ac  [anon:dalvik-classes3.dex extracted in memory from /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/base.apk!classes3.dex] (com.k2fsa.sherpa.onnx.MainActivity$$ExternalSyntheticLambda0.onClick+0)
2023-11-10 17:39:11.150 18526-18526 DEBUG                   pid-18526                            A        #49 pc 00000000003074bc  [anon:dalvik-classes.dex extracted in memory from /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/base.apk] (com.google.android.material.button.MaterialButton.performClick+0)
---------------------------- PROCESS ENDED (18430) for package com.k2fsa.sherpa.onnx ----------------------------
2023-11-10 17:39:11.408  1190-1226  WindowManager           pid-1190                             E  win=Window{a478533 u0 com.k2fsa.sherpa.onnx/com.k2fsa.sherpa.onnx.MainActivity EXITING} destroySurfaces: appStopped=false cleanupOnResume=false win.mWindowRemovalAllowed=true win.mRemoveOnExit=true win.mViewVisibility=0 caller=com.android.server.wm.ActivityRecord.destroySurfaces:6536 com.android.server.wm.ActivityRecord.destroySurfaces:6517 com.android.server.wm.WindowState.onExitAnimationDone:5966 com.android.server.wm.ActivityRecord$$ExternalSyntheticLambda10.accept:2 java.util.ArrayList.forEach:1528 com.android.server.wm.ActivityRecord.onAnimationFinished:8605 com.android.server.wm.ActivityRecord.postApplyAnimation:6250 

This is the model.onnx, lexicon.txt and token.txt generated from your notebook - https://drive.google.com/file/d/1ndZ5MSyS8482Eht6IP1jqxsMwIvB9Ln6/view?usp=sharing

When running your notebook, this was the only error i encountered but i'm not sure that matters:

%%shell

pip install -q TTS:

 Preparing metadata (setup.py) ... done
......
  Building wheel for encodec (setup.py) ... done
  Building wheel for umap-learn (setup.py) ... done
  Building wheel for bnnumerizer (setup.py) ... done
  Building wheel for bnunicodenormalizer (setup.py) ... done
  Building wheel for docopt (setup.py) ... done
  Building wheel for gruut-ipa (setup.py) ... done
  Building wheel for gruut_lang_de (setup.py) ... done
  Building wheel for gruut_lang_en (setup.py) ... done
  Building wheel for gruut_lang_es (setup.py) ... done
  Building wheel for gruut_lang_fr (setup.py) ... done
  Building wheel for pynndescent (setup.py) ... done
  Building wheel for gruut (setup.py) ... done
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires fastapi, which is not installed.
lida 0.0.10 requires kaleido, which is not installed.
lida 0.0.10 requires python-multipart, which is not installed.
lida 0.0.10 requires uvicorn, which is not installed.
plotnine 0.12.4 requires numpy>=1.23.0, but you have numpy 1.22.0 which is incompatible.
tensorflow 2.14.0 requires numpy>=1.23.5, but you have numpy 1.22.0 which is incompatible.

This android crash occurs when i export and use my fine-tuned coqui tts model too.

@csukuangfj
Copy link
Author

The app loads the model, lexicon and tokens all right but when you tap "Generate" the app crashes with:

@anita-smith1

Could you show more logs, something like below:

13:48.972 19945-19945 sherpa-onnx             com.k2fsa.sherpa.onnx                I  Start to initialize TTS
2023-11-11 11:13:49.063 19945-19945 sherpa-onnx             com.k2fsa.sherpa.onnx                W  config:
                                                                                                    OfflineTtsConfig(model=OfflineTtsModelConfig(vits=OfflineTtsVitsModelConfig(model="vits-coqui-en-vctk/model.onnx", lexicon="vits-coqui-en-vctk/lexicon.txt", tokens="vits-coqui-en-vctk/tokens.txt", noise_scale=0.667, noise_scale_w=0.8, length_scale=1), num_threads=2, debug=True, provider="cpu"), rule_fsts="")
2023-11-11 11:14:07.089 19945-19945 sherpa-onnx             com.k2fsa.sherpa.onnx                W  ---vits model---
                                                                                                    punctuation=; : , . ! ? ¡ ¿ — … " « » “ ”  
                                                                                                    add_blank=1
                                                                                                    sample_rate=22050
                                                                                                    language=English
                                                                                                    n_speakers=109
                                                                                                    comment=coqui
                                                                                                    model_type=vits

I need to see the model meta data output.

@csukuangfj
Copy link
Author

@csukuangfj Superr!

For converting piper models to sherpa-onnx using the provided colab notebook, i can confirm that it works on SherpaOnnxTTS

but for converting coqui tts vits model to sherpa onnx, using same android code, the android app crashes on all test devices

@nanaghartey

Could you post some logcat output for the crash?

@csukuangfj
Copy link
Author

@anita-smith1

From you error log,


2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #01 pc 00000000003a9404  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libonnxruntime.so (BuildId: 5bc662f139575789b099c8c4823a8ed1565a0d4c)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #02 pc 000000000009f914  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libsherpa-onnx-core.so (Ort::detail::SessionImpl<OrtSession>::Run(Ort::RunOptions const&, char const* const*, Ort::Value const*, unsigned long, char const* const*, unsigned long)+204) (BuildId: c9b089160ce1cb938a4794e9c94d3a1636656712)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #03 pc 0000000000124ba4  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libsherpa-onnx-core.so (sherpa_onnx::OfflineTtsVitsModel::Impl::RunVits(Ort::Value, long, float)+1128) (BuildId: c9b089160ce1cb938a4794e9c94d3a1636656712)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #04 pc 0000000000123f9c  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libsherpa-onnx-core.so (sherpa_onnx::OfflineTtsVitsModel::Impl::Run(Ort::Value, long, float)+96) (BuildId: 

It crashes in the function

sherpa_onnx::OfflineTtsVitsModel::Impl::RunVits

which does not look right. I think there is something wrong in the comment field of your model meta data.
It should be coqui or piper for models from coqui and piper.

@anita-smith1
Copy link

@csukuangfj

The app loads the model, lexicon and tokens all right but when you tap "Generate" the app crashes with:

@anita-smith1

Could you show more logs, something like below:

13:48.972 19945-19945 sherpa-onnx             com.k2fsa.sherpa.onnx                I  Start to initialize TTS
2023-11-11 11:13:49.063 19945-19945 sherpa-onnx             com.k2fsa.sherpa.onnx                W  config:
                                                                                                    OfflineTtsConfig(model=OfflineTtsModelConfig(vits=OfflineTtsVitsModelConfig(model="vits-coqui-en-vctk/model.onnx", lexicon="vits-coqui-en-vctk/lexicon.txt", tokens="vits-coqui-en-vctk/tokens.txt", noise_scale=0.667, noise_scale_w=0.8, length_scale=1), num_threads=2, debug=True, provider="cpu"), rule_fsts="")
2023-11-11 11:14:07.089 19945-19945 sherpa-onnx             com.k2fsa.sherpa.onnx                W  ---vits model---
                                                                                                    punctuation=; : , . ! ? ¡ ¿ — … " « » “ ”  
                                                                                                    add_blank=1
                                                                                                    sample_rate=22050
                                                                                                    language=English
                                                                                                    n_speakers=109
                                                                                                    comment=coqui
                                                                                                    model_type=vits

I need to see the model meta data output.

This is my log. Comment field says coqui but when I compare to yours I can see the numofspeakers is 0 for mine:

2023-11-11 04:17:49.611 14902-14902 sherpa-onnx             com.k2fsa.sherpa.onnx                I  Start to initialize TTS
2023-11-11 04:17:49.701 14902-14902 sherpa-onnx             com.k2fsa.sherpa.onnx                W  config:
                                                                                                    OfflineTtsConfig(model=OfflineTtsModelConfig(vits=OfflineTtsVitsModelConfig(model="original_colab_model/model.onnx", lexicon="original_colab_model/lexicon.txt", tokens="original_colab_model/tokens.txt", noise_scale=0.667, noise_scale_w=0.8, length_scale=1), num_threads=2, debug=True, provider="cpu"), rule_fsts="")
2023-11-11 04:17:51.634 14902-14902 libc                    com.k2fsa.sherpa.onnx                W  Access denied finding property "ro.mediatek.platform"
2023-11-11 04:18:00.766 14902-14902 sherpa-onnx             com.k2fsa.sherpa.onnx                W  ---vits model---
                                                                                                    punctuation=; : , . ! ? ¡ ¿ — … " « » “ ”  
                                                                                                    add_blank=1
                                                                                                    sample_rate=22050
                                                                                                    language=English
                                                                                                    n_speakers=0
                                                                                                    comment=coqui
                                                                                                    model_type=vits
2023-11-11 04:18:06.575 14902-14902 sherpa-onnx             com.k2fsa.sherpa.onnx                I  Finish initializing TTS
2023-11-11 04:18:06.657 14902-14902 MSHandlerLifeCycle      com.k2fsa.sherpa.onnx                I  check: return. pkg=com.k2fsa.sherpa.onnx parent=null callers=com.android.internal.policy.DecorView.setVisibility:4411 android.app.ActivityThread.handleResumeActivity:5476 android.app.servertransaction.ResumeActivityItem.execute:54 android.app.servertransaction.ActivityTransactionItem.execute:45 android.app.servertransaction.TransactionExecutor.executeLifecycleState:176 
2023-11-11 04:18:06.657 14902-14902 MSHandlerLifeCycle      com.k2fsa.sherpa.onnx                I  removeMultiSplitHandler: no exist. decor=DecorView@c4a4467[]
2023-11-11 04:18:06.684 14902-14945 NativeCust...ncyManager com.k2fsa.sherpa.onnx                D  [NativeCFMS] BpCustomFrequencyManager::BpCustomFrequencyManager()
2023-11-11 04:18:06.716 14902-14902 InsetsController        com.k2fsa.sherpa.onnx                D  onStateChanged: InsetsState: {mDisplayFrame=Rect(0, 0 - 1920, 1200), 

@csukuangfj
Copy link
Author

@anita-smith1

Are you using the latest master of sherpa-onnx or the version >= v1.8.9?

@anita-smith1
Copy link

anita-smith1 commented Nov 11, 2023

@anita-smith1

Are you using the latest master of sherpa-onnx or the version >= v1.8.9?

I am using version 1.8.7. Same version used in the apks you originally shared

@anita-smith1
Copy link

@anita-smith1

Are you using the latest master of sherpa-onnx or the version >= v1.8.9?

I just tried v1.8.9 and it worked :)

@csukuangfj
Copy link
Author

@anita-smith1
Are you using the latest master of sherpa-onnx or the version >= v1.8.9?

I just tried v1.8.9 and it worked :)

Glad to hear that it works for you.

By the way, we have pre-built Android APKs for the VITS English models from Coqui.

Screenshot 2023-11-11 at 12 54 36

https://k2-fsa.github.io/sherpa/onnx/tts/apk.html

@anita-smith1
Copy link

@anita-smith1
Are you using the latest master of sherpa-onnx or the version >= v1.8.9?

I just tried v1.8.9 and it worked :)

Glad to hear that it works for you.

By the way, we have pre-built Android APKs for the VITS English models from Coqui.

Screenshot 2023-11-11 at 12 54 36

https://k2-fsa.github.io/sherpa/onnx/tts/apk.html

Thank you very much for your amazing support. It's incredible!

Now android demo works with my fine-tuned coqui models too. By the way, in case I want to use Java instead of kotlin for android, would I need to build from source? There seems to be only kotlin support for android at the moment.

Also are there plans to release same on-device tts for iOS too?

@csukuangfj
Copy link
Author

By the way, in case I want to use Java instead of kotlin for android, would I need to build from source?

Sorry that we only have Kotlin for Android demo. But the jni interface can also be used in Java.

You can reuse the .so files for java.

You can build sherpa-onnx from source for Android
by following https://k2-fsa.github.io/sherpa/onnx/android/index.html
if you want to use the latest master of sherpa-onnx.


Also are there plans to release same on-device tts for iOS too?

Yes, it is in the plan. We have already supported speech-to-text on iOS. Adding text-to-speech to iOS is very easy
with our current code in sherpa-onnx. We will do that in the coming week.

@anita-smith1
Copy link

@csukuangfj That's great to hear!

I have another noob question:

I have a fine tuned coqui tts vits model that contains non-English words. I use a custom CMU.in.IPA.txt and custom all-english-words.txt file for the non-English words (with few English words). When I synthesise using the cell that contains this code:

   ......
def main():
 model = OnnxModel("./model.onnx")
 text = "xse wo atua de a fa"
 x = vits.tokenizer.text_to_ids(text, vits.tokenizer)
 x = torch.tensor(x, dtype=torch.int64)
 y = model(x)
 print(y.shape)
 soundfile.write("test.wav", y.numpy(), model.sample_rate)

main()

Everything works. The pronunciation is good. However when I use:

sherpa-onnx-offline-tts \
  --vits-model=./model.onnx \
  --vits-lexicon=./lexicon.txt \
  --vits-tokens=./tokens.txt \
  --output-filename=./test.wav \
  "xse wo atua de a fa"

I get unexpected results - the pronunciations are wrong. Even for the few English words.

This is a sample of my ipa.txt file which contains both non-english and few English words:

a,              ʌ
atua,           ejtujʌ
call,           kɔˈl
de,             ðʌ
din,            diˈn
edin,           ejdiˈn
fa,             fʌ
frq,            fɹɛ
line,           lajˈn
mma,            mɑˈ
mobile,         mowˈbʌl
na,             nɑˈ
naa,            nɑˈɑˈ
ndrope,         ɪŋdɹɑˌpi
ne,             nɪ
no,             nu
nreflecte,      ɪŋɹʌflɛˈktɪ
nxma,           nɔmbɚ

all-english-words.txt contains all the words. I followed the same format used in the original English list and I used a phoneme to IPA converter.

What do I need to do to make this work? Thank you

@csukuangfj
Copy link
Author

I used a phoneme to IPA converter.

Please add your words to all-english-words.txt and let the code generate the pronunciations for you.

The pronunciations in CMU.in.IPA.txt are discarded and are never used. Only the first column, i.e., the words,
is used in CMU.in.IPA.txt.

Please don't use an IPA converter to generate pronunciations for your new words.

The code uses get_phones in one cell from the colab to generate pronunciations for a given word.

@anita-smith1
Copy link

@csukuangfj you are right! It's better now, though there is still some difference ( a slight loss in pronunciation compared to the original coqui model).
Is the order of the words I add important? I added them at the bottom of all-english-words.txt file. Also, is anything else I can do to improve the pronunciations on sherpa-onnx?

@rmcpantoja
Copy link
Contributor

Hi @csukuangfj,
Why not make an Android port of piper_phonemize and use it in next gen TTS instead of a lexicon? These voices could be used in a screen reader in the future, and there will be many words will try to read that may not be in that lexicon.

@csukuangfj
Copy link
Author

Why not make an Android port of piper_phonemize and use it in next gen TTS instead of a lexicon?

We try to support as many models as possible, including models not from piper, which may not use piper_phonemize.


there will be many words will try to read that may not be in that lexicon

We can add OOV words to the lexicon.txt manually. By the way, the lexicon.txt already covers lots of words.

@csukuangfj
Copy link
Author

Is the order of the words I add important?

The order does not matter as long as you don't add duplicate words.

If you add duplicate words, then the first one has a higher priority and later ones are ignored.


is anything else I can do to improve the pronunciations on sherpa-onnx?

I realized that a word that appears in a sentence with other words can have a different pronunciation from when it appears standalone. I am afraid it is hard, if not impossible, to improve the pronunciations with the current approach.

@beqabeqa473
Copy link

beqabeqa473 commented Nov 14, 2023 via email

@csukuangfj
Copy link
Author

Maybe a better approach is to change the modeling unit from phones to other units that can be entirely derived from words.

@anita-smith1
Copy link

anita-smith1 commented Nov 20, 2023

@csukuangfj true, single words seem to have poor pronunciations compared to same words in phrases. However, the fact that this tts solution works offline is amazing. By the way can inference be done on the GPU? And importantly, has the iOS version been released?

@csukuangfj
Copy link
Author

And importantly, has the iOS version been released

I am writing the code now. Please wait a day or two.

Screenshot 2023-11-23 at 11 04 55

@anita-smith1
Copy link

@csukuangfj wow can't wait :) your work is amazing

@csukuangfj
Copy link
Author

@anita-smith1

The iOS demo is ready now.

You can run text-to-speech with Next-gen Kaldi on iOS using
k2-fsa/sherpa-onnx#443

I have recorded a video showing how to use it. Please see
https://www.youtube.com/watch?v=MvePdkuMNJk


single words seem to have poor pronunciations compared to same words in phrases

Don't worry. I will try to fix it.

@anita-smith1
Copy link

@csukuangfj incredible. that was quick! The video demo is fantastic. Great Job! but for me seems I'd have to wait since my project does not use swiftUI. It only uses uikit. I can see you have this to help us build the android from source. Can you add documentation for iOS too? so we can build our iOS too from source

@csukuangfj
Copy link
Author

@anita-smith1
Copy link

@sweetbbak
Copy link

sweetbbak commented Nov 29, 2023

Is it possible to add my custom model? Here is the link:
/ivona_hq.tar.lzma

If you want to check it out, it can be unpacked with:

tar --lzma --extract --file ivona_hq.tar.lzma 

Ive been wanting to use this voice but the original application is 32bit and nearly 15 years old at this point. Its being held together by threads.

@csukuangfj
Copy link
Author

is it a vits model? could you show the extracted files?

@csukuangfj
Copy link
Author

Is it possible to add my custom model? Here is the link: /ivona_hq.tar.lzma

If you want to check it out, it can be unpacked with:

tar --lzma --extract --file ivona_hq.tar.lzma 

Ive been wanting to use this voice but the original application is 32bit and nearly 15 years old at this point. Its being held together by threads.

@sweetbbak

I just looked at the model and found that it is piper-based, so it is definitely supported by sherpa-onnx.

I have converted the model. You can find it at
https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
Screenshot 2023-11-30 at 11 17 48


I have also added it to
https://huggingface.co/spaces/k2-fsa/text-to-speech

Screenshot 2023-11-30 at 11 18 41

For the following text:

“Today as always, men fall into two 
groups: slaves and free men. Whoever does not 
have two-thirds of his day for himself, is a slave, whatever he 
may be: a statesman, a businessman, an official, or a scholar.”

It generates the following audio:

cee2f734-4dcb-4494-bd29-27e0c7786c3a.mov

@csukuangfj
Copy link
Author

By the way, you can also find the exported model at
https://huggingface.co/csukuangfj/vits-piper-en_US-sweetbbak-amy/tree/main

The above repo also contains the script for exporting.

@csukuangfj
Copy link
Author

@sweetbbak

The audio sounds like British English, but the JSON config file says the voice is en-us. Is there something wrong?

@sweetbbak
Copy link

Thank you so much! Thats my mistake, more than likely. I believe I used en_US-amy as a base to fine-tune off of because it sounded better. So I was unsure if it should be marked as en_US or en_GB. Its definitely a British voice.

@csukuangfj
Copy link
Author

Thank you so much! Thats my mistake, more than likely. I believe I used en_US-amy as a base to fine-tune off of because it sounded better. So I was unsure if it should be marked as en_US or en_GB. Its definitely a British voice.

I am changing it to en_GB.

@sweetbbak
Copy link

I appreciate it.

@csukuangfj
Copy link
Author

FYI:

No lexicon.txt is required any longer. We are also using piper-phonemize in sherpa-onnx.

You can find all models from piper in sherpa-onnx at
https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models

@LuekWasHere
Copy link

LuekWasHere commented Dec 31, 2023

@beqabeqa473

@csukuangfj thanks for your work.

To evaluate models it would be great to implement android tts engine api.

Im very new to working with android let alone with tts apis. Would it be possible to lead me in the right direction to trying to develope a android tts engine api? I have found some examples of offline-tts though im trying to figure how convert this into a engine api. I was thinking some of these would be useful ( TTS service wrapper , TTS-engine ), I think i just need some guidance as I am lost in a sea of knowledge. Let me know what you think.

@csukuangfj
Copy link
Author

TTS-engine

Please wait for a moment. I have managed to create a tts engine service that you can use to replace the system tts engine.

I will create a PR soon.

@LuekWasHere
Copy link

LuekWasHere commented Dec 31, 2023

I will create a PR soon.
@csukuangfj
Woah that is legendary. I'd love to explore what you did to do this, I'm interested in learning. I'll hold tight. Thanks
Sorry for my naivety, What does PR stand for?

@beqabeqa473
Copy link

beqabeqa473 commented Dec 31, 2023 via email

@csukuangfj
Copy link
Author

I will create a PR soon.
@csukuangfj
Woah that is legendary. I'd love to explore what you did to do this, I'm interested in learning. I'll hold tight. Thanks
Sorry for my naivety, What does PR stand for?

@LuekWasHere

PR is short for pull request.


I just created one at k2-fsa/sherpa-onnx#508

You can find a YouTube video at
https://www.youtube.com/watch?v=33QYuVzDORA

@csukuangfj
Copy link
Author

By the way, you can download pre-built text-to-speech engine APKs at
https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html

Screenshot 2024-01-01 at 12 42 52

@Web3Kev
Copy link

Web3Kev commented Apr 18, 2024

@csukuangfj Awesome repo Fangjun Kuang ! do you know if anybody is working on a dart package for next gen Kaldi to be used on android and iOS ?

@csukuangfj
Copy link
Author

@Web3Kev

Yes, please see
k2-fsa/sherpa-onnx#379

@Web3Kev
Copy link

Web3Kev commented Apr 18, 2024

thanks !

@csukuangfj
Copy link
Author

@csukuangfj Awesome repo Fangjun Kuang ! do you know if anybody is working on a dart package for next gen Kaldi to be used on android and iOS ?

@Web3Kev Yes, we have now. Please see
https://pub.dev/packages/sherpa_onnx

Screenshot 2024-07-10 at 15 35 50

We have flutter TTS demo for Piper. Please see

https://github.com/k2-fsa/sherpa-onnx/tree/master/flutter-examples/tts

@powline99
Copy link

@csukuangfj @beqabeqa473 or Anyone (Help Please) I am not advanced enough to get into the coding of this model etc...

I am just trying to get it to work on Android for use in my Accessibility TTS readers. I see there were created apks https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html .

When I install on my Google Pixel it only allows one to be used at a time (no input or way to change to a different voice's apk under the app)? It installs the new voice over the same APK from the download.

Is there a easier way to do this with Fdroid app where you can just select the voice you want off the onnx voices?

Thanks,
DD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants