Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] document multi-speaker models #4026

Open
surak opened this issue Oct 15, 2024 · 3 comments
Open

[Feature request] document multi-speaker models #4026

surak opened this issue Oct 15, 2024 · 3 comments
Labels
feature request feature requests for making TTS better. wontfix This will not be worked on but feel free to help.

Comments

@surak
Copy link

surak commented Oct 15, 2024

🚀 Feature Description

This is a request for improving the documentation. On the readme, you have a

  • List the available speakers and choose a <speaker_id> among them:
    $ tts --model_name "<language>/<dataset>/<model_name>" --list_speaker_idxs

But you don't mention a model whatsoever, leaving the user to download all of the almost hundred models to figure which one actually does that.

Solution

Well, one example would be great - it's maybe obvious for people from the field; but it isn't for others.

Alternative Solutions

A partial download which would be enough to query every single model without downloading the whole set of weights, just metadata.

@surak surak added the feature request feature requests for making TTS better. label Oct 15, 2024
@Kreevoz
Copy link

Kreevoz commented Oct 18, 2024

There aren't even that many models for a given language. The ones that have multispeaker capabilities would be using a multi-speaker dataset like vctk, which is listed when you query the available models. The ones based on ljspeech are all single-speaker models, since that is a single female speaker dataset.

@eginhard
Copy link
Contributor

In theory such information could be added to the .models.json file, so that it can be accessed without downloading a model. I would consider a PR adding that information and exposing it in the API.

But I agree. Most languages only have very few models and even fewer datasets, so that currently it doesn't take a lot of effort to find out manually.

Copy link

stale bot commented Dec 8, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Dec 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request feature requests for making TTS better. wontfix This will not be worked on but feel free to help.
Projects
None yet
Development

No branches or pull requests

3 participants