Dev #74

shivammehta25 · 2024-05-27T11:52:58Z

What does this PR do?

If the dataset is structured as

data/
└── LJSpeech-1.1
    ├── metadata.csv
    ├── README
    ├── test.txt
    ├── train.txt
    ├── val.txt
    └── wavs

Then you can extract the phoneme level alignments from a Trained Matcha-TTS model using:

python  matcha/utils/get_durations_from_trained_model.py -i dataset_yaml -c <checkpoint>

Example:

python  matcha/utils/get_durations_from_trained_model.py -i ljspeech.yaml -c matcha_ljspeech.ckpt

or simply!

matcha-tts-get-durations -i ljspeech.yaml -c matcha_ljspeech.ckpt

In the datasetconfig turn on load duration.
Example: ljspeech.yaml

load_durations: True

or see an examples in configs/experiment/ljspeech_from_durations.yaml

shivammehta25 added 6 commits May 24, 2024 11:34

Adding the possibility of get durations out of pretrained model

4b39f6c

Updating the notebook to adjust to the change

d816c40

Pinning gradio

e658aee

Adding the possibility to train with durations

aa496aa

Fixing batched synthesis for multispeaker model

de91038

Adding configuration for training from durations

ac0b258

shivammehta25 linked an issue May 27, 2024 that may be closed by this pull request

Matcha-TTS has very low GPU utilization. #73

Closed

shivammehta25 merged commit bd37d03 into main May 27, 2024
1 check failed