Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different voice for different emotion of character / narrator #314

Closed
shivshankar11 opened this issue Aug 15, 2024 · 5 comments
Closed

Different voice for different emotion of character / narrator #314

shivshankar11 opened this issue Aug 15, 2024 · 5 comments

Comments

@shivshankar11
Copy link

Is your feature request related to a problem? Please describe.
for sillytavern - Different voice for different emotion of character / narrator, extras api support emotion detection.

Describe the solution you'd like
Different voice for different emotion

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@erew123
Copy link
Owner

erew123 commented Aug 15, 2024

Hi @shivshankar11

I think you are asking to be able to specify emotion [happy, sad, joy, anger etc]. These features are TTS engine specific and the XTTS engine does not support this feature. It was on their roadmap as "Implement emotion and style adaptation." coqui-ai/TTS#378

As for any other engines that are currently implemented, they do not support this type of feature. Such a feature will only be possible as/when I can implement a TTS engine that supports such a thing (As mentioned on the front of Github, I do not make the TTS engines https://github.com/erew123/alltalk_tts?tab=readme-ov-file#%EF%B8%8F-about-this-project--me

Finally, any TTS engines I know of that support any feature like this, require that the emotion style is sent over as part of the TTS generation request e.g.

[angry] Don't tell me that. [happy] Lets go out to the park today

As such, both the AI model/llm used would have to be capable of sending such information in its text, so that it can then be forwarded on for TTS generation and SillyTavern would probably have to code such a feature into their interface before I could pass the the text through to an emotion capable model.

I am looking at other models, please see the feature requests here #74, however the work within ST or with LLM's will still need to be done by the people whom deal with those things.

Thanks

@erew123 erew123 closed this as completed Aug 15, 2024
@shivshankar11
Copy link
Author

i want to use happy/sad sounding/ energetic voice mp3/wav files for cloning, not TTS engine specific.

@erew123
Copy link
Owner

erew123 commented Aug 15, 2024

Hi @shivshankar11 I provide extra voices on the links there, https://github.com/erew123/alltalk_tts?tab=readme-ov-file#-other-installation-notes

Beyond that, you can create your own wav files from any audio you can find. This is detailed in the help section of the TTS engine.

image

I only provide limited samples because of copyright issues and I cannot include audio samples that have any form of legal copyright claim on them.

Thanks

@shivshankar11
Copy link
Author

can you look into this https://github.com/SillyTavern/SillyTavern-Extras
--classification-model | Load a custom sentiment classification model.Expects a HuggingFace model ID.Default (6 emotions):
we can select audio sample file based on emotion status provided by sentiment classification model.

@erew123
Copy link
Owner

erew123 commented Aug 15, 2024

Hi @shivshankar11 Ive had a look at that, its bit too far out of the core of what I am trying to do with AllTalk and I already have a huge block of code specific to the core of AllTalk to work on. I just dont have time at the moment to work on code of that level thats more than basic integration into another application e.g. ST.

I also note its a dead project, however I can see what someone has taken on maintaining it and updating it https://github.com/Abdulhanan535/SillyTavern-ExtrasFix and has been making changes are recent as last week.

image

I'm assuming they will have a reasonable grasp of that code base, so it may be better to approach them about building integration to other TTS engines via their API calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants