Support onnx #1

Yosshi999 · 2021-10-17T12:22:58Z

related: https://github.com/Hiroshiba/voicevox_engine/issues/69

TODO

np.finfoとpositional encodingのmax_lenは固定してしまった。おそらく生成波形が50秒を超えるとクラッシュする。
- なんとかする
Cython wrapperの変更

Description

python run.py --yukarin_s_model_dir "model/yukarin_s" --yukarin_sa_model_dir "model/yukarin_sa" --yukarin_sosoa_model_dir "model/yukarin_sosoa" --hifigan_model_dir "model/hifigan" --speaker_ids 5 --method=convert でonnxへの変換が可能。modelフォルダのyukarin_s, yukarin_sa, yukarin_sosoaにonnxが保存される
yukarin_sosoaにはhifi_ganと合わせたdecode.onnxが保存される
onnxで実行したい場合は--method=onnxとする； python run.py --yukarin_s_model_dir "model/yukarin_s" --yukarin_sa_model_dir "model/yukarin_sa" --yukarin_sosoa_model_dir "model/yukarin_sosoa" --hifigan_model_dir "model/hifigan" --speaker_ids 5 --method=onnx
おそらくtorchをimportしていない
テストの結果波形の相対誤差が1e-3くらいになったがもっと小さくできるかは不明。decodeで何故か誤差が出てきてしまう
実際の重みで聞いてみる必要がある
テストコマンド: python test.py --yukarin_s_model_dir "model/yukarin_s" --yukarin_sa_model_dir "model/yukarin_sa" --yukarin_sosoa_model_dir "model/yukarin_sosoa" --hifigan_model_dir "model/hifigan" --speaker_ids 6 --texts "おはようございます、こんにちは、こんばんは"

Hiroshiba · 2021-10-17T14:29:47Z

良いですね！！　このリポジトリにプルリクエストを送る配慮もとてもありがたいです。

Cythonの動作確認はC++版実装ができてからになると思います。
ので、このプルリクエストでは、とりあえずpython版を実現するということにし、C++版に関しては別PRにするのもありなのかなと思いました。

Yosshi999 · 2021-10-17T16:09:38Z

positional encodingは動的生成させました

Cythonの動作確認はC++版実装ができてからになると思います。

👍 Cythonの件はここではパスします

Yosshi999 · 2021-10-30T18:47:30Z

https://github.com/Hiroshiba/voicevox_core/pull/34 こちらのREADMEではあまり言及していませんがonnx変換方法がこっちで書かれていた方が良いのでマージお願いします。

Hiroshiba

LGTM！！！

Yosshi999 added 9 commits October 17, 2021 17:31

update gitignore

9a86593

fix batchsize to 1

a9f7b9e

yukarin_s: pure torch module

e876c4e

yukarin_sa: pure torch module

a24fee7

yukarin_sosoa: pure torch module

0857148

decoder: pure torch module

d47436f

onnx-convert, onnx-run

b86cac1

test script: we find decoded result is not allclose

2e29ee4

set random seed, now rel-err is 1e-3. small enough?

f006800

dynamic positional encoding

092ccd9

show gpu usage

615c1a7

Yosshi999 mentioned this pull request Oct 21, 2021

Support onnx VOICEVOX/voicevox_core#21

Merged

add explanation about onnx

5ca8f25

Hiroshiba approved these changes Dec 11, 2021

View reviewed changes

Hiroshiba merged commit 539ca8f into Hiroshiba:main Dec 11, 2021

Yosshi999 mentioned this pull request Jan 7, 2022

previewのonnx版コアを使った合成で、開始・終了無音を0.3~0.4秒くらいにすると音声が変になることがある VOICEVOX/voicevox_core#62

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support onnx #1

Support onnx #1

Yosshi999 commented Oct 17, 2021 •

edited

Loading

Hiroshiba commented Oct 17, 2021

Yosshi999 commented Oct 17, 2021 •

edited

Loading

Yosshi999 commented Oct 30, 2021

Hiroshiba left a comment

Support onnx #1

Support onnx #1

Conversation

Yosshi999 commented Oct 17, 2021 • edited Loading

TODO

Description

Hiroshiba commented Oct 17, 2021

Yosshi999 commented Oct 17, 2021 • edited Loading

Yosshi999 commented Oct 30, 2021

Hiroshiba left a comment

Choose a reason for hiding this comment

Yosshi999 commented Oct 17, 2021 •

edited

Loading

Yosshi999 commented Oct 17, 2021 •

edited

Loading