Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Voice-to-text not working again #253

Open
StudioDweller opened this issue Sep 12, 2024 · 12 comments
Open

Voice-to-text not working again #253

StudioDweller opened this issue Sep 12, 2024 · 12 comments

Comments

@StudioDweller
Copy link

It's not putting the text in the message box to send. I received an email from OpenAI yesterday saying that the default model is changing to "gpt-4o-2024-08-06". I wouldn't think that would break it, but I guess they changed something on the page.

@StudioDweller
Copy link
Author

StudioDweller commented Sep 17, 2024

@hoshizorista Are you experiencing the same issue? If so, do you have any suggestions for a fix? Our weekly podcast relies on this heavily and we are missing having it. Thank you!

@hoshizorista
Copy link

hoshizorista commented Sep 25, 2024

@StudioDweller yeeeep, working on it! sorry for not noticing github didnt informed me, im actually finishing off a rework of the extension with some autostart and cool functions (such as use base gpt voices) ill have it ready for tomorrow at late night, hang in there!

@StudioDweller
Copy link
Author

StudioDweller commented Sep 26, 2024 via email

@StudioDweller
Copy link
Author

@hoshizorista We would love to have you as a guest on our podcast. Let me know if that’s something that you would be interested in. We sincerely appreciate your efforts with maintaining this extension and would love to talk to you about it.

@nowallslive
Copy link

Hi @hoshizorista also really appreciate you working on this - I've been using the extension as a way to experiment with chat gpt in performance so getting it back on line would be amazing. There isn't really a comparable tool. Also @StudioDweller would love to know more about where to find your podcast - sounds interesting.

@StudioDweller
Copy link
Author

StudioDweller commented Sep 27, 2024

@nowallslive I do a weekly podcast called Up Against Reality on all things AI along my co-host Chris and we leverage this extension for realtime interaction with our AI co-host/custom GPT we call RAINA. The podcast is available on most of the major podcast platforms. This episode is a good showcase of realtime interactions using this extension. Thanks for your interest!

https://upagainstreality.com/2024/03/12/rainas-20-questions/

@hoshizorista
Copy link

@StudioDweller It would be an honor :), @nowallslive My pleasure!

I just released the update on my fork, added some new functions, I'm praying is not that buggy haha, please check it out and let me know if it works for you guys, https://github.com/hoshizorista/talkgpt/tree/main

Just download the extension from the latest release, decompress, make sure to uninstall all previous versions, install and enjoy! lmk if you had any issues

You guys can reefer to my fork, looks like C-Nedelu already moved on from this so I'll work on my fork to keep his work alive so it works for all of us!

@enrix507
Copy link

@hoshizorista 👍 hello friend , First of all, thank you so much for your incredible work on the extension! I’ve really been enjoying its features, and I’m excited about the new updates you mentioned, like autostart and the base GPT voices.

I wanted to recommend a service that might interest you: Fish.audio. It’s quite similar to Eleven Labs, but much more affordable and accessible. They offer 50 free uses per day, and their API pricing for cloned voices is significantly lower compared to Eleven Labs. This service allows you to use a high-quality voice through its API at a very low cost for much longer periods, which could be a more budget-friendly option compared to Eleven Labs. Eleven Labs can get expensive quickly and is hard to use for ongoing conversations with the chatbot because it consumes too much money, making it impractical for constant use—more suited for specific, punctual moments.

That’s why I’d like to recommend Fish Audio, as I use it for my projects, and it works really well for me. Additionally, they offer an API that could be adapted to your extension.

I think it would be awesome to have the option to integrate Fish Audio into your ChatGPT extension for Text-to-Speech (TTS). The quality is excellent, the latency is super fast, and the pricing is very competitive. Here’s the link to their site: https://fish.audio/, and here is their documentation: https://docs.fish.audio/.

I really believe this would be a great addition to the extension, and I’m sure many users would appreciate it. If you're interested in exploring the integration, I’d love to support the project, and I’d be happy to make a donation to your PayPal as a token of appreciation for adding this feature. I’m confident many users would be excited about the idea as well!

Thanks for your time, and keep up the great work. Cheers!

@hoshizorista
Copy link

@enrix507 Hey! sounds like a good idea! havent heard of it but if it supports streaming its very likely it can be added, gonna look into it!

@StudioDweller
Copy link
Author

StudioDweller commented Nov 14, 2024 via email

@hoshizorista
Copy link

@StudioDweller Hey Larry! thanks for noticing, gonna check it out!, I notice you mention its happening between "some words", is it happening under any specific words? or words from another lenguage?, The model by default is selected based on the voice (IE. Roger or Aria are under are under Multi-lenguage auto detect V.2) this is done for simplicity since its similar on how ElevenLabs tts works on the page, maybe we can do some tests changing the model to see if we have any improvement, plz let me know the name of the voice youre using so we can check it out

@StudioDweller
Copy link
Author

@hoshizorista The delays seem to happen consistently after the first 1 to 3 words and then randomly during the rest of the response. The voice I’m using is a custom voice that was generated using their “instant voice clone” and I don’t see a way to set a default model for it in my ElevenLabs account. FWIW, I noticed in the code a mention of “Eleven Turbo 2” and the current low latency model is 2.5.

Thanks so much for your help and efforts with this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants