Generative AI is producing a bunch of fun new models for us devs to poke at. Did you know you can use these over the phone?
Twilio gives you a superpower called Media Streams which gives you a Websocket connection to both sides of a phone call. You can get audio streamed to you, process it, and send audio back.
This repo serves as WIP demo but is exploring two models using Deepgram for Speech to Text and the incredibly fun elevenlabs for Text to Speech.
Sign up for Deepgram and ElevenLabs
Use something like ngrok to tunnel and then expose port 3000
ngrok http 3000
Copy .env.example
to .env
and update keys
Set SERVER
to your tunneled URL
Install the necessary packages
npm install
Start the web server
node server.js
Wire up your Twilio number using the console or CLI
twilio phone-numbers:update +18889876 --voice-url=https://your-server.ngrok.io/incoming
There is a Stream TwiML verb that will connect a stream to your websocket server.