[project-s] ハミング用APIを追加 #1008

y-chan · 2024-01-14T09:11:49Z

内容

題の通りです。

その他

ドキュメントなどは適当に作ったので、少し修正した方がいいかもしれません。

Hiroshiba

まだちゃんと処理コード追えてないのですが、とりあえずコメントしてみました！
最初の二つ以外はAPIに関わらないので、コメントだけ足しといてあと回しでも全然良いかなと思います。
（そのための別ブランチですし）

voicevox_engine/metas/Metas.py

Hiroshiba · 2024-01-14T09:33:55Z

voicevox_engine/model.py

+    """
+
+    key: int | None = Field(title="音階")
+    length: int = Field(title="音符の長さ")


（細かいけど）説明は「フレーム長」とかのが秒と勘違いしなくて良いかも？
まあintなのでわかるのですが…。あ、JSだとどちらもnumberなのでちょっとだけわかりやすい……かも…？

いっそframe_lengthにした方が良いかも。

一旦説明を「フレーム長」に変える方向で対処しようかなと...!
Phonemeモデルについても、lengthプロパティを持つので、もし変えるならプロパティ名は統一した方がいいかもですね...

考えてたんですが、FrameAudioQueryの方はlengthでも良いけど、Phonemeの方はframeレベルなのか自明ではないので、FramePhonemeにするかframe_lengthにしたほうが良い気がしました！
たぶんFramePhonemeにしつつframe_lengthにもするのが良さそう。

あとAPI構成はここだけですね！
エディタ側の変更も少なくなるし、とりあえずここだけ決めて変えておきたい気持ちがちょっとあります。

FramePhonemeにしつつframe_lengthにもする

これが良さそうに思いました、これで行きたいと思います...!

voicevox_engine/tts_pipeline/tts_engine.py

Hiroshiba · 2024-01-14T09:50:06Z

voicevox_engine/tts_pipeline/tts_engine.py

@@ -374,6 +426,143 @@ def synthesize_wave(
        wave = raw_wave_to_output_wave(query, raw_wave, sr_raw_wave)
        return wave

+    def get_sing_phoneme_and_f0_and_volume(


getは意図と反してるかも。書くなら…create…？

あとf0とphonemeと〜ってのをやめて、「歌い方」みたいなドメイン用語作っても良いかも。
create_歌い方　みたいな（英語考えてないですが。。）
あるいは「フレーム特徴量」みたいな用語作ってcreate_sing_frame_featureとか。

（まあAPIに露出しないので、fixmeコメントだけ書いて後回しでも。）

とりあえずgetはcreateに置換して、FIXMEコメントにしてしまおうかなと...!

voicevox_engine/tts_pipeline/tts_engine.py

Co-authored-by: Hiroshiba <[email protected]>

This reverts commit 12b8fc6.

github-actions · 2024-01-14T11:27:28Z

Coverage Result

Resultを開く

Name	Stmts	Miss
run.py	526	336
voicevox_engine/init.py	1	0
voicevox_engine/cancellable_engine.py	94	72
voicevox_engine/core_adapter.py	81	34
voicevox_engine/core_initializer.py	59	30
voicevox_engine/core_wrapper.py	257	183
voicevox_engine/dev/core/init.py	2	0
voicevox_engine/dev/core/mock.py	36	8
voicevox_engine/dev/tts_engine/init.py	2	0
voicevox_engine/dev/tts_engine/mock.py	28	0
voicevox_engine/engine_manifest/EngineManifest.py	35	0
voicevox_engine/engine_manifest/EngineManifestLoader.py	12	0
voicevox_engine/engine_manifest/init.py	3	0
voicevox_engine/library_manager.py	92	5
voicevox_engine/metas/Metas.py	36	0
voicevox_engine/metas/MetasStore.py	18	6
voicevox_engine/metas/init.py	2	0
voicevox_engine/model.py	180	9
voicevox_engine/morphing.py	71	46
voicevox_engine/part_of_speech_data.py	5	0
voicevox_engine/preset/Preset.py	13	0
voicevox_engine/preset/PresetError.py	2	0
voicevox_engine/preset/PresetManager.py	80	2
voicevox_engine/preset/init.py	4	0
voicevox_engine/setting/Setting.py	11	0
voicevox_engine/setting/SettingLoader.py	17	0
voicevox_engine/setting/init.py	3	0
voicevox_engine/tts_pipeline/acoustic_feature_extractor.py	34	0
voicevox_engine/tts_pipeline/kana_converter.py	88	1
voicevox_engine/tts_pipeline/mora_list.py	7	0
voicevox_engine/tts_pipeline/text_analyzer.py	146	6
voicevox_engine/tts_pipeline/tts_engine.py	266	93
voicevox_engine/user_dict.py	145	12
voicevox_engine/utility/init.py	5	0
voicevox_engine/utility/connect_base64_waves.py	37	0
voicevox_engine/utility/core_version_utility.py	8	1
voicevox_engine/utility/mutex_utility.py	13	0
voicevox_engine/utility/path_utility.py	35	9
voicevox_engine/utility/run_utility.py	10	7
TOTAL	2464	860

Hiroshiba

こちらのAPIに関わってくる部分以外はLGTMです！

気になったとこはissueの方にコメントで列挙しておいてみました。あとで潰していければ！

LGTMですが、子音がないモーラが来たときにバグりそうな気がしました！
こんな感じでCoreのモック書いちゃえば、意外とすぐテスト（兼チェック用のコード）書けるかもです。

あ、コメントがいっぱい書いてあって読みやすかったです！
すごい細かいですが、２行のコメントにしてるところはコミッターによるおしゃれの差が出ちゃうので、１行にしちゃったほうが良い気がしました。

run.py

Hiroshiba · 2024-01-14T11:52:06Z

run.py

@@ -704,6 +706,77 @@ def _synthesis_morphing(
            background=BackgroundTask(delete_file, f.name),
        )

+    @app.post(
+        "/sing_frame_audio_query",


判断メモです。
ここをsingにすべきかsongにすべきかですごく迷うのですが、これは「歌うためのクエリ」であり、「歌のクエリ」ではないので、singが合っているのかなと思いました。

もし仮に名詞にするならsongではなくsong_snyhtesisとかかなと。長いのでsingで良さそう。

Hiroshiba · 2024-01-14T12:10:25Z

voicevox_engine/tts_pipeline/tts_engine.py

+            vowel_length = note_durations[i]
+            phoneme_durations.append(vowel_length)
+
+    phoneme_durations_array = np.array(phoneme_durations, dtype=np.int64)


後回しで良いのですが、この関数の引数のnote_durationsはNDArrayな一方、phoneme_durationsはlistで、逆にphoneme_durations_arrayはNDArrayとなかなかややこしいことになっていそうです。

以前のエディタ辞書機能追加時に後回しにしたタスクはたしか全部は片付いてないんですよね･･････。
後回しタスクは絶対やるという気持ちでいきましょう･･･！！

Hiroshiba · 2024-01-14T12:15:22Z

voicevox_engine/tts_pipeline/tts_engine.py

+            # もし、次のノートの子音長が負になる場合、現在のノートの半分にする
+            if next_consonant_length < 0:
+                next_consonant_length = consonant_lengths[i + 1] = note_duration // 2


ここのワークアラウンド、普通に考えるとノート長の半分よりも１フレームにするとかのが良い気がしました。
もし最低１フレーム保証にするだけなら、コアの方に書くべきかも？

普通に考えれば1にするのが正しそうですが、子音長を1にすると、滑舌が怪しくなることがあるので、できれば避けたいかもです。
とはいえ、前のノートが長い場合、子音長もつられて長くなりそうなので、少し検討した方がいいなとは思いますね...

とはいえ負になるのはまずモデルが悪いので、コア側で調整するのが正しいそうですね...

普通に考えれば1にするのが正しそうですが、子音長を1にすると、滑舌が怪しくなることがあるので、できれば避けたいかもです。

気持ちはわかるのですが、それはそれで1だと推論したものがそのままになるので、意図と実装がずれてる雰囲気を感じます。
触ってないのでわからないのですが、子音長が1だった場合は2にする、とかが良い気がしてます。

とりあえずマージを目指すのが良さそうな気がするので、非負になる処理をコアに足すことをFIXMEコメントにしてとりあえず後回しにするのはどうでしょう。

Suggested change

# もし、次のノートの子音長が負になる場合、現在のノートの半分にする

if next_consonant_length < 0:

next_consonant_length = consonant_lengths[i + 1] = note_duration // 2

# もし、次のノートの子音長が負になる場合、現在のノートの半分にする

# FIXME: 非負にする処理をコア側に足す

if next_consonant_length < 0:

next_consonant_length = consonant_lengths[i + 1] = note_duration // 2

Hiroshiba · 2024-01-14T12:34:21Z

voicevox_engine/tts_pipeline/tts_engine.py

+def calc_phoneme_lengths(
+    consonant_lengths: NDArray[np.int64],
+    note_durations: NDArray[np.int64],
+) -> NDArray[np.int64]:


あれ、これって子音長とノート長だけから、子音・母音の全音素の長さを返す感じですよね。
母音のみのノートだった場合どうなるんでしょう･･･？

母音のみのノートは子音長が0になるようなマスクがモデル内部でかかっているので、ノートの頭から母音が始まるだけの処理になります。
暗黙的な処理ではあるので、どこかで説明は必要そうですね....
あと、子音長が0と誤って予測された場合のことを考えていないので、それも含めて改善してからマージすべきな気はします。

少し調整します。

Hiroshiba · 2024-01-14T12:36:05Z

voicevox_engine/tts_pipeline/tts_engine.py

+        # 予測した子音長を元に、すべての音素長を計算する
+        phoneme_lengths = calc_phoneme_lengths(consonant_lengths, note_lengths_array)
+
+        # 時間スケールを変更する（音素 → フレーム）
+        frame_phonemes = np.repeat(phonemes_array, phoneme_lengths)


phoneme_lengthsは必ずノート数×２だけど、phonemes_arrayは音素数になってて数が合ってない気がする･･･？

Co-authored-by: Hiroshiba <[email protected]>

Hiroshiba · 2024-01-22T14:25:39Z

バグっている箇所もあるんですが、とりあえずAPIをブランチから生やしたいのでマージさせていただきます！！

y-chan added 8 commits January 14, 2024 04:01

update metas (add style type)

f1a4c7f

update engine manifest (add frame rate)

babc923

add sing api to core wrapper

1da6ec8

add sing api to core adapter

844ef22

add models for sing api

9132a84

add sing process to tts engine

062e91b

add sing api

09af2c2

fix miss

202bfbe

y-chan requested a review from a team as a code owner January 14, 2024 09:11

y-chan requested review from Hiroshiba and removed request for a team January 14, 2024 09:11

Hiroshiba reviewed Jan 14, 2024

View reviewed changes

y-chan and others added 10 commits January 14, 2024 19:46

add fixme comment

9002cc6

Co-authored-by: Hiroshiba <[email protected]>

remove sing type

afd5c12

fix typo

475c88e

remove optional

12b8fc6

translate error detail

d1cea98

get -> create

352cf6b

fix docs

b82a9d6

Revert "remove optional"

ff0fad3

This reverts commit 12b8fc6.

fix pytest

efdb7eb

add comment

5f5b820

Hiroshiba reviewed Jan 14, 2024

View reviewed changes

y-chan and others added 2 commits January 14, 2024 22:01

add fixme comment

92dcd91

Co-authored-by: Hiroshiba <[email protected]>

improve models

4d3d3df

Hiroshiba merged commit 7bc1b21 into VOICEVOX:project-s Jan 22, 2024
1 of 3 checks passed

PickledChair mentioned this pull request Feb 24, 2024

同バージョンの別エディションを利用可能にする #303

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[project-s] ハミング用APIを追加 #1008

[project-s] ハミング用APIを追加 #1008

y-chan commented Jan 14, 2024

Hiroshiba left a comment

Hiroshiba Jan 14, 2024

y-chan Jan 14, 2024

Hiroshiba Jan 14, 2024 •

edited

Loading

y-chan Jan 14, 2024

Hiroshiba Jan 14, 2024

y-chan Jan 14, 2024

github-actions bot commented Jan 14, 2024 •

edited

Loading

Hiroshiba left a comment •

edited

Loading

Hiroshiba Jan 14, 2024

Hiroshiba Jan 14, 2024 •

edited

Loading

Hiroshiba Jan 14, 2024 •

edited

Loading

y-chan Jan 14, 2024

Hiroshiba Jan 14, 2024 •

edited

Loading

Hiroshiba Jan 14, 2024

y-chan Jan 14, 2024 •

edited

Loading

Hiroshiba Jan 14, 2024

Hiroshiba commented Jan 22, 2024

[project-s] ハミング用APIを追加 #1008

[project-s] ハミング用APIを追加 #1008

Conversation

y-chan commented Jan 14, 2024

内容

関連 Issue

その他

Hiroshiba left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hiroshiba Jan 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jan 14, 2024 • edited Loading

Coverage Result

Hiroshiba left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hiroshiba Jan 14, 2024 • edited Loading

Choose a reason for hiding this comment

Hiroshiba Jan 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hiroshiba Jan 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

y-chan Jan 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hiroshiba commented Jan 22, 2024

Hiroshiba Jan 14, 2024 •

edited

Loading

github-actions bot commented Jan 14, 2024 •

edited

Loading

Hiroshiba left a comment •

edited

Loading

Hiroshiba Jan 14, 2024 •

edited

Loading

Hiroshiba Jan 14, 2024 •

edited

Loading

Hiroshiba Jan 14, 2024 •

edited

Loading

y-chan Jan 14, 2024 •

edited

Loading