Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio clips off during the final word #487

Open
phayke opened this issue Jan 10, 2025 · 4 comments
Open

Audio clips off during the final word #487

phayke opened this issue Jan 10, 2025 · 4 comments

Comments

@phayke
Copy link

phayke commented Jan 10, 2025

🔴 If you have installed AllTalk in a custom Python environment, I will only be able to provide limited assistance/support. AllTalk draws on a variety of scripts and libraries that are not written or managed by myself, and they may fail, error or give strange results in custom built python environments.

🔴 Please generate a diagnostics report and upload the "diagnostics.log" as this helps me understand your configuration.

https://github.com/erew123/alltalk_tts/tree/main?#-how-to-make-a-diagnostics-report-file

Describe the bug
A clear and concise description of what the bug is.
Audio produced cuts off halfway thru the final word consistently.
To Reproduce
Steps to reproduce the behaviour:
Generate speech
Screenshots
If applicable, add screenshots to help explain your problem.

Text/logs
If applicable, copy/paste in your logs here from the console.

Desktop (please complete the following information):
AllTalk was updated: [approx. date]
Custom Python environment: [yes/no give details if yes]
Text-generation-webUI was updated: [approx. date]

Additional context
Add any other context about the problem here.

@EPCarter
Copy link

EPCarter commented Jan 10, 2025

Describe the bug:
In XTTS Models Finetuning Dataset .wav files, the last word is clipped most of the time.

Steps to reproduce the behaviour:
Generate Dataset using XTTS Models Finetuning

Screenshots:
Settings
Start-up

Text/logs:
finetune.log
diagnostics.log

Desktop (please complete the following information):
AllTalk was updated: Jan 5, 2025
Custom Python environment: No
Text-generation-webUI was updated: Jan 5, 2025

Additional context:
#477 (comment)
Audio Samples.zip
Source Audio Sample.zip

@erew123
Copy link
Owner

erew123 commented Jan 13, 2025

Hi @EPCarter So this is dataset generation for finetuning? @Yohrog has been looking at this and submitting a PR but Ive not heard back from them for a while now. #419

But listening to your audio, you are saying it clipping early on the end of a word, correct?

@EPCarter
Copy link

That is correct. The Source Sample is what I start with and input to the fine-tuning, and the Audio Samples are what come out when fine-tuning chops things into .wav files for the dataset generation. Nearly all the .wav files are clipped on the last word, regardless of the length.

@unifirer
Copy link

@erew123 i confirm the same thing is happening to me, not consistently but about 20% of the times, first or final words in a sentence is partially spoken, happens on both xttsv2_2.0.2 and xttsv2_2.0.3 using zero shot with 10s cleaned audio clips using tts generator.

this has been a persistent problem from alltalk v1. i thought its a problem with xtts that you cant fix, different makers right? but ill post this anyway

thank you very much for your hardwork, im enjoying alltalk every much in our chaotic world.

dont overwork yourself, i hope ur family gets better, youre helping these strangers posting here so much and your documentation is great, im impressed at your dedication to a free project.

ure a saint, i dont believe in god but i believe in good people like you, better than saving lives, youre improving mine so much, i would never use a command line for this haha, the world has moved on to GUIs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants