Talk = GPT-2 + Whisper + WASM #167

ggerganov · 2022-11-17T16:34:37Z

ggerganov
Nov 17, 2022
Maintainer

I just had an awesome idea:

Make a web-page that:

Listens when someone speaks
Transcribes the words using WASM Whisper
Generates a new sentence using WASM GPT-2
Uses Web Speech API to synthesise the speech and play it on the speakers.

All of this running locally in the browser - no server required

I have all the ingredients and I think the performance is just enough. I just have to put it together.
The total data that the page will have to load on startup (probably using Fetch API) is:

74 MB for the Whisper tiny.en model
240 MB for the GPT-2 small model
Web Speech API is built-in in modern browsers

I think it will be very fun because you could talk to the web-page or even add extra devices that talk to each other only through the mic and the speakers. For example, you simply open the page on your phone and tablet and put them next to each other - listen to them talk about something 😄

Any ideas to make this even more fun?

Update:

This is now fully functional at: https://whisper.ggerganov.com/talk/

Source code is here: https://github.com/ggerganov/whisper.cpp/tree/master/examples/talk.wasm

Looking for beta testers, feedback and ideas for improvement!

talk-2.mp4

eschmidbauer · 2022-11-17T16:48:11Z

eschmidbauer
Nov 17, 2022

this sounds really fun!

0 replies

ggerganov · 2022-11-20T16:40:01Z

ggerganov
Nov 20, 2022
Maintainer Author

So.. this is turning out to be even better than I expected 😆

talk-0.mp4

0 replies

Vuizur · 2022-11-20T22:49:25Z

Vuizur
Nov 20, 2022

These results are extremely impressive! I recently tried to implement something similar in Python, only not locally, but instead using different online APIs, but it felt worse than your demo video because Whisper is much better than the free Google Speech Recognition API (and your optimized version runs significantly better on CPU than the standard Whisper Python lib I tried) :).

0 replies

ggerganov · 2022-11-24T18:11:32Z

ggerganov
Nov 24, 2022
Maintainer Author

And here is a less cringe video to demonstrate the capabilities of this implementation:

talk-tech-demo-0-lq.mp4

These are 2 Chrome tabs talking and being nice to each other using the microphone and the speakers of a Macbook.

0 replies

beartell · 2022-11-25T13:06:52Z

beartell
Nov 25, 2022

There is an error on browser Firefox and Chrome..

1 reply

ggerganov Nov 27, 2022
Maintainer Author

Likely, you haven't enabled cross-origin isolation on your HTTP server.
For more information, see my #88 (comment)

beartell · 2022-11-27T14:01:20Z

beartell
Nov 27, 2022

Amazing solution. Works like a charm ;) Pazar, Kasım 27, 2022 13:17 +03 tarihinde, Georgi Gerganov ***@***.***> şunu yazdı: Likely, you haven't enabled cross-origin isolation on your HTTP server. For more information, see my #88 (comment) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

0 replies

aklos · 2022-12-07T14:26:55Z

aklos
Dec 7, 2022

This is great, been trying to make the same thing except through terminal. Why GPT-2 instead of the GPT-3 text-davinci-003 model?

7 replies

ghost Dec 7, 2022

@ggerganov I meant the recently released one, I think it's free, I've seen some scripts and trying those. I mean, the newer chatbot is pretty advanced compared to the dull GPT-2. Very stark difference.

aklos Dec 7, 2022

@kingofdelphi I'm pretty sure OpenAI might ban people who use workarounds to use the ChatGPT API. They haven't released a public API yet, so anything you do for that right now would be probably violating ToS.

@ggerganov To be fair though, the API requests are cheap enough that I still prefer to use GPT-3 for personal apps. It might be a good idea to allow people to switch to GPT-3 (using their own API key) if they want to.

Anx2k Dec 7, 2022

I wonder if running something like GPT-JT would be possible:
https://www.together.xyz/blog/releasing-v1-of-gpt-jt-powered-by-open-source-ai
Really the GPT-2 part is the only kind of disappointing aspect, as Whisper is great and even the speech synth is more than acceptable, but GPT-2 is giving me Eliza flashbacks. ;)

ggerganov Dec 7, 2022
Maintainer Author

You won't be able to load GPT-JT in a web-page - it's too big. You can easily run talk in the command-line using GPT-J or GPT-JT, but then you won't have speech synthesiser. But I'm pretty sure somebody will hack together something very soon.

ggerganov Dec 9, 2022
Maintainer Author

So I made a terminal version running the GPT-2 1558M XL model. Unfortunately, the conversations are still not very coherent. I think the prompt strategy has to be improved somehow, but I cannot figure out how.

santa-kid-2.mp4

Source code: https://github.com/ggerganov/whisper.cpp/blob/master/examples/talk

waynenilsen · 2022-12-07T17:24:32Z

waynenilsen
Dec 7, 2022

should use hot mic method hold down space to talk so you can take longer pauses while you think about what to say release spacebar to translate speech to text

0 replies

jinsu35 · 2023-03-24T05:43:06Z

jinsu35
Mar 24, 2023

I think it would be great to have inference of a text-to-speech model like this.
How would one go about doing this?

0 replies

lili21 · 2023-04-08T10:15:33Z

lili21
Apr 8, 2023

how about run the wasm on a server?

0 replies

wagler · 2023-12-08T02:40:07Z

wagler
Dec 8, 2023

Can't you accelerate the model inference with the GPU via WebGPU for C++?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Talk = GPT-2 + Whisper + WASM #167

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 11 comments 8 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Talk = GPT-2 + Whisper + WASM #167

ggerganov Nov 17, 2022 Maintainer

Replies: 11 comments · 8 replies

ggerganov Nov 20, 2022 Maintainer Author

ggerganov Nov 24, 2022 Maintainer Author

ggerganov Nov 27, 2022 Maintainer Author

ggerganov Dec 7, 2022 Maintainer Author

ggerganov Dec 9, 2022 Maintainer Author

ggerganov
Nov 17, 2022
Maintainer

Replies: 11 comments 8 replies

ggerganov
Nov 20, 2022
Maintainer Author

ggerganov
Nov 24, 2022
Maintainer Author

ggerganov Nov 27, 2022
Maintainer Author

ggerganov Dec 7, 2022
Maintainer Author

ggerganov Dec 9, 2022
Maintainer Author