Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

追加:queryに無音時間調整関連のパラメータを作成 #1308

Merged
merged 35 commits into from
Jun 7, 2024
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
3643fe5
追加:絶対値か倍率かのフラグ
X-20A May 23, 2024
28cafba
追加:絶対値か倍率かのフラグ[is_pauseLengthUseScale]
X-20A May 23, 2024
f3c1e84
追加:絶対値か倍率かのフラグ[is_pauseLengthUseScale]
X-20A May 23, 2024
ed2d5e4
追加:絶対値か倍率かのフラグ[is_pauseLengthUseScale]
X-20A May 23, 2024
6e9843d
追加:絶対値か倍率かのフラグ[is_pauseLengthUseScale]
X-20A May 23, 2024
0c9654d
追加:絶対値か倍率かのフラグ[is_pauseLengthUseScale]
X-20A May 23, 2024
0d14296
追加:絶対値か倍率かのフラグ[is_pauseLengthUseScale]
X-20A May 23, 2024
9a4fba3
追加:絶対値か倍率かのフラグ[is_pauseLengthUseScale]
X-20A May 23, 2024
7a4d629
追加:絶対値か倍率かのフラグ[is_pauseLengthUseScale]
X-20A May 23, 2024
2a42719
Update tts_engine.py
X-20A May 23, 2024
de25a16
Update tts_engine.py
X-20A May 24, 2024
3670286
Update tts_engine.py
X-20A May 24, 2024
83e7b87
Update tts_engine.py
X-20A May 24, 2024
432fea2
Update tts_engine.py
X-20A May 24, 2024
a5c329e
Update tts_engine.py
X-20A May 24, 2024
3edd5e4
Update tts_engine.py
X-20A May 24, 2024
3454dea
Update tts_engine.py
X-20A May 24, 2024
6f0ff91
削除:isPauseLengthUseScale, isPauseLengthFixed
X-20A May 25, 2024
b825e36
削除:isPauseLengthUseScale, isPauseLengthFixed
X-20A May 25, 2024
c1f4751
削除:isPauseLengthUseScale, isPauseLengthFixed
X-20A May 25, 2024
5ba4beb
削除:isPauseLengthUseScale, isPauseLengthFixed
X-20A May 25, 2024
d4140c2
pauseLengthをNone許容
X-20A May 28, 2024
9eb99fb
pauseLengthをNone許容
X-20A May 28, 2024
dca9d42
片付け
X-20A Jun 6, 2024
b7f91db
片付け
X-20A Jun 6, 2024
3f58be8
片付け
X-20A Jun 6, 2024
806954e
片付け
X-20A Jun 6, 2024
484a34f
片付け
X-20A Jun 6, 2024
9b613e0
片付け
X-20A Jun 6, 2024
2410d3e
Apply suggestions from code review
Hiroshiba Jun 7, 2024
50ff096
テストをちょっと変更
Hiroshiba Jun 7, 2024
d7192fe
minimum追加
Hiroshiba Jun 7, 2024
21f7f9d
Merge remote-tracking branch 'upstream/master' into pr/X-20A/1308-1
Hiroshiba Jun 7, 2024
777674b
更新忘れ
Hiroshiba Jun 7, 2024
29f3ff5
「句読点などの無音時間」
Hiroshiba Jun 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion build_util/check_release_build.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ def test_release_build(dist_dir: Path, skip_run_process: bool) -> None:
run_file = dist_dir / "run"
if not run_file.exists():
run_file = dist_dir / "run.exe"

print(f"run_file : {run_file}")
Hiroshiba marked this conversation as resolved.
Show resolved Hide resolved
# 起動
process = None
if not skip_run_process:
Expand Down
2 changes: 2 additions & 0 deletions presets.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,5 @@
volumeScale: 1
prePhonemeLength: 0.1
postPhonemeLength: 0.1
pauseLength: null
pauseLengthScale: 1

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions test/e2e/single_api/morphing/test_synthesis_morphing.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ def test_post_synthesis_morphing_200(client: TestClient) -> None:
"volumeScale": 1.0,
"prePhonemeLength": 0.1,
"postPhonemeLength": 0.1,
"pauseLength": None,
"pauseLengthScale": 1.0,
"outputSamplingRate": 24000,
"outputStereo": False,
"kana": "テ'_スト",
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions test/e2e/single_api/preset/test_add_preset.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ def test_post_add_preset_200(
"volumeScale": 1,
"prePhonemeLength": 10,
"postPhonemeLength": 10,
"pauseLength": None,
"pauseLengthScale": 1,
}
response = client.post("/add_preset", params={}, json=preset)
assert response.status_code == 200
Expand Down
2 changes: 2 additions & 0 deletions test/e2e/single_api/preset/test_presets.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,7 @@

def test_get_presets_200(client: TestClient, snapshot_json: SnapshotAssertion) -> None:
response = client.get("/presets")
print("snapshot", snapshot_json)
print("response", response.json())
Hiroshiba marked this conversation as resolved.
Show resolved Hide resolved
assert response.status_code == 200
assert snapshot_json == response.json()
4 changes: 4 additions & 0 deletions test/e2e/single_api/preset/test_update_preset.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ def test_post_update_preset_200(
"volumeScale": 1,
"prePhonemeLength": 10,
"postPhonemeLength": 10,
"pauseLength": None,
"pauseLengthScale": 1,
}
response = client.post("/update_preset", params={}, json=preset)
assert response.status_code == 200
Expand All @@ -40,6 +42,8 @@ def test_post_update_preset_422(
"volumeScale": 404,
"prePhonemeLength": 404,
"postPhonemeLength": 404,
"pauseLength": 404,
"pauseLengthScale": 404,
}
response = client.post("/update_preset", params={}, json=preset)
assert response.status_code == 422
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions test/e2e/single_api/tts_pipeline/test_multi_synthesis.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ def test_post_multi_synthesis_200(client: TestClient) -> None:
"volumeScale": 1.0,
"prePhonemeLength": 0.1,
"postPhonemeLength": 0.1,
"pauseLength": None,
"pauseLengthScale": 1.0,
"outputSamplingRate": 24000,
"outputStereo": False,
"kana": "テ'_スト",
Expand All @@ -52,6 +54,8 @@ def test_post_multi_synthesis_200(client: TestClient) -> None:
"volumeScale": 1.0,
"prePhonemeLength": 0.2,
"postPhonemeLength": 0.1,
"pauseLength": None,
"pauseLengthScale": 1.0,
"outputSamplingRate": 24000,
"outputStereo": False,
"kana": "テ'_ストト",
Expand Down
2 changes: 2 additions & 0 deletions test/e2e/single_api/tts_pipeline/test_synthesis.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ def test_post_synthesis_200(client: TestClient, snapshot: SnapshotAssertion) ->
"volumeScale": 1.0,
"prePhonemeLength": 0.1,
"postPhonemeLength": 0.1,
"pauseLength": None,
"pauseLengthScale": 1.0,
"outputSamplingRate": 24000,
"outputStereo": False,
"kana": "テ'_スト",
Expand Down
4 changes: 4 additions & 0 deletions test/preset/presets-test-1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
volumeScale: 1
prePhonemeLength: 0.1
postPhonemeLength: 0.1
pauseLength: null
pauseLengthScale: 1.0

- id: 2
name: test2
Expand All @@ -19,3 +21,5 @@
volumeScale: 0.7
prePhonemeLength: 0.5
postPhonemeLength: 0.5
pauseLength: null
pauseLengthScale: 1.0
4 changes: 4 additions & 0 deletions test/preset/presets-test-2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
volumeScale: 1
prePhonemeLength: 0.1
postPhonemeLength: 0.1
pauseLength: null
pauseLengthScale: 1.0

- id: 2
name: test2
Expand All @@ -19,3 +21,5 @@
volumeScale: 0.7
prePhonemeLength: 0.5
postPhonemeLength: 0.5
pauseLength: null
pauseLengthScale: 1.0
4 changes: 4 additions & 0 deletions test/preset/presets-test-3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
volumeScale: 1
prePhonemeLength: 0.1
postPhonemeLength: 0.1
pauseLength: null
pauseLengthScale: 1.0

- id: 1
name: test2
Expand All @@ -19,3 +21,5 @@
volumeScale: 0.7
prePhonemeLength: 0.5
postPhonemeLength: 0.5
pauseLength: null
pauseLengthScale: 1.0
18 changes: 18 additions & 0 deletions test/preset/test_preset.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,8 @@ def test_add_preset(self) -> None:
"volumeScale": 1,
"prePhonemeLength": 0.1,
"postPhonemeLength": 0.1,
"pauseLength": None,
"pauseLengthScale": 1.0,
}
)
id = preset_manager.add_preset(preset)
Expand Down Expand Up @@ -106,6 +108,8 @@ def test_add_preset_load_failure(self) -> None:
"volumeScale": 0,
"prePhonemeLength": 0,
"postPhonemeLength": 0,
"pauseLength": 0,
"pauseLengthScale": 0,
}
)
)
Expand All @@ -126,6 +130,8 @@ def test_add_preset_conflict_id(self) -> None:
"volumeScale": 1,
"prePhonemeLength": 0.1,
"postPhonemeLength": 0.1,
"pauseLength": None,
"pauseLengthScale": 1.0,
}
)
id = preset_manager.add_preset(preset)
Expand All @@ -152,6 +158,8 @@ def test_add_preset_conflict_id2(self) -> None:
"volumeScale": 1,
"prePhonemeLength": 0.1,
"postPhonemeLength": 0.1,
"pauseLength": None,
"pauseLengthScale": 1.0,
}
)
id = preset_manager.add_preset(preset)
Expand All @@ -178,6 +186,8 @@ def test_add_preset_write_failure(self) -> None:
"volumeScale": 1,
"prePhonemeLength": 0.1,
"postPhonemeLength": 0.1,
"pauseLength": None,
"pauseLengthScale": 1.0,
}
)
preset_manager.load_presets()
Expand Down Expand Up @@ -206,6 +216,8 @@ def test_update_preset(self) -> None:
"volumeScale": 1,
"prePhonemeLength": 0.1,
"postPhonemeLength": 0.1,
"pauseLength": None,
"pauseLengthScale": 1.0,
}
)
id = preset_manager.update_preset(preset)
Expand Down Expand Up @@ -234,6 +246,8 @@ def test_update_preset_load_failure(self) -> None:
"volumeScale": 0,
"prePhonemeLength": 0,
"postPhonemeLength": 0,
"pauseLength": 0,
"pauseLengthScale": 0,
}
)
)
Expand All @@ -254,6 +268,8 @@ def test_update_preset_not_found(self) -> None:
"volumeScale": 1,
"prePhonemeLength": 0.1,
"postPhonemeLength": 0.1,
"pauseLength": None,
"pauseLengthScale": 1.0,
}
)
with self.assertRaises(
Expand All @@ -279,6 +295,8 @@ def test_update_preset_write_failure(self) -> None:
"volumeScale": 1,
"prePhonemeLength": 0.1,
"postPhonemeLength": 0.1,
"pauseLength": None,
"pauseLengthScale": 1.0,
}
)
preset_manager.load_presets()
Expand Down
2 changes: 2 additions & 0 deletions test/test_mock_tts_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ def test_synthesize_wave(self) -> None:
volumeScale=1,
prePhonemeLength=0.1,
postPhonemeLength=0.1,
pauseLength=None,
pauseLengthScale=1.0,
outputSamplingRate=24000,
outputStereo=False,
kana=create_kana(self.accent_phrases_hello_hiho),
Expand Down
3 changes: 3 additions & 0 deletions test/tts_pipeline/test_tts_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,8 @@ def _gen_hello_hiho_query() -> AudioQuery:
volumeScale=1.3,
prePhonemeLength=0.1,
postPhonemeLength=0.2,
pauseLength=None,
pauseLengthScale=1.0,
Hiroshiba marked this conversation as resolved.
Show resolved Hide resolved
outputSamplingRate=12000,
outputStereo=True,
kana=_gen_hello_hiho_kana(),
Expand Down Expand Up @@ -376,6 +378,7 @@ def test_mocked_synthesize_wave_output(snapshot_json: SnapshotAssertion) -> None
# Inputs
tts_engine = TTSEngine(MockCoreWrapper())
hello_hiho = _gen_hello_hiho_query()
print(hello_hiho)
Hiroshiba marked this conversation as resolved.
Show resolved Hide resolved
# Outputs
result = tts_engine.synthesize_wave(hello_hiho, StyleId(1))
# Tests
Expand Down
7 changes: 6 additions & 1 deletion test/tts_pipeline/test_wave_synthesizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ def _gen_query(
intonationScale: float = 1.0,
prePhonemeLength: float = 0.0,
postPhonemeLength: float = 0.0,
pauseLength: float | None = -1,
Hiroshiba marked this conversation as resolved.
Show resolved Hide resolved
pauseLengthScale: float = 1.0,
volumeScale: float = 1.0,
outputSamplingRate: int = 24000,
outputStereo: bool = False,
Expand All @@ -39,6 +41,8 @@ def _gen_query(
intonationScale=intonationScale,
prePhonemeLength=prePhonemeLength,
postPhonemeLength=postPhonemeLength,
pauseLength=pauseLength,
pauseLengthScale=pauseLengthScale,
volumeScale=volumeScale,
outputSamplingRate=outputSamplingRate,
outputStereo=outputStereo,
Expand Down Expand Up @@ -269,6 +273,8 @@ def test_query_to_decoder_feature() -> None:
intonationScale=0.5,
prePhonemeLength=2 * 0.01067,
postPhonemeLength=6 * 0.01067,
pauseLength=None,
pauseLengthScale=1.0,
Hiroshiba marked this conversation as resolved.
Show resolved Hide resolved
)

# Expects
Expand All @@ -295,7 +301,6 @@ def test_query_to_decoder_feature() -> None:

# Outputs
phoneme, f0 = query_to_decoder_feature(query)

assert np.array_equal(phoneme, true_phoneme)
assert np.array_equal(f0, true_f0)

Expand Down
5 changes: 4 additions & 1 deletion voicevox_engine/app/routers/tts_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,8 @@ def audio_query(
volumeScale=1,
prePhonemeLength=0.1,
postPhonemeLength=0.1,
pauseLength=None,
pauseLengthScale=1,
outputSamplingRate=core.default_sampling_rate,
outputStereo=False,
kana=create_kana(accent_phrases),
Expand Down Expand Up @@ -108,6 +110,8 @@ def audio_query_from_preset(
volumeScale=selected_preset.volumeScale,
prePhonemeLength=selected_preset.prePhonemeLength,
postPhonemeLength=selected_preset.postPhonemeLength,
pauseLength=selected_preset.pauseLength,
pauseLengthScale=selected_preset.pauseLengthScale,
outputSamplingRate=core.default_sampling_rate,
outputStereo=False,
kana=create_kana(accent_phrases),
Expand Down Expand Up @@ -217,7 +221,6 @@ def synthesis(
wave = engine.synthesize_wave(
query, style_id, enable_interrogative_upspeak=enable_interrogative_upspeak
)

Hiroshiba marked this conversation as resolved.
Show resolved Hide resolved
with NamedTemporaryFile(delete=False) as f:
soundfile.write(
file=f, data=wave, samplerate=query.outputSamplingRate, format="WAV"
Expand Down
2 changes: 2 additions & 0 deletions voicevox_engine/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ class AudioQuery(BaseModel):
volumeScale: float = Field(title="全体の音量")
prePhonemeLength: float = Field(title="音声の前の無音時間")
postPhonemeLength: float = Field(title="音声の後の無音時間")
pauseLength: float | None = Field(title="テキスト内の無音時間(絶対値)")
pauseLengthScale: float = Field(title="テキスト内の無音時間(倍率)")
Hiroshiba marked this conversation as resolved.
Show resolved Hide resolved
outputSamplingRate: int = Field(title="音声データの出力サンプリングレート")
outputStereo: bool = Field(title="音声データをステレオ出力するか否か")
kana: str | None = Field(
Expand Down
2 changes: 2 additions & 0 deletions voicevox_engine/preset/Preset.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,5 @@ class Preset(BaseModel):
volumeScale: float = Field(title="全体の音量")
prePhonemeLength: float = Field(title="音声の前の無音時間")
postPhonemeLength: float = Field(title="音声の後の無音時間")
pauseLength: float | None = Field(title="テキスト内の無音時間")
pauseLengthScale: float = Field(title="テキスト内の無音時間(倍率)")
Loading