Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to stream Text to Speech? #487

Closed
user080975 opened this issue Nov 11, 2023 · 17 comments
Closed

How to stream Text to Speech? #487

user080975 opened this issue Nov 11, 2023 · 17 comments
Assignees

Comments

@user080975
Copy link

user080975 commented Nov 11, 2023

According to the documentation here for Text to Speech:
https://platform.openai.com/docs/guides/text-to-speech?lang=node

There is the possibility of streaming audio without waiting for the full file to buffer. But the example is a Python one. Is there any possibility of streaming the incoming audio using Node JS?

@abhishekgoyal1
Copy link

abhishekgoyal1 commented Nov 13, 2023

This needs to be added. I've tried all different methods but I don't think it's supported natively in the node SDK at all at the moment.

This does return a streamable object but there are no chunks found while iterating through it:

const stream = await openai.audio.speech.create( { model: 'tts-1', voice: 'alloy', input: textData, response_format: 'opus', }, { stream: true }, );

@rattrayalex
Copy link
Collaborator

Yes, this works today – I'm sorry that the example code doesn't reflect that.

You can simply access response.body which is a readable stream (in web, a true ReadableStream and in Node, a Readable), like so:

async function main() {
  const response = await openai.audio.speech.create({
    model: 'tts-1',
    voice: 'alloy',
    input: 'the quick brown fox jumped over the lazy dogs',
  });

  const stream = response.body;
}

I'll try to update the example soon, and won't close this issue until I do. Feel free to share use-cases you'd like to see in the example here, with sample code.

@rattrayalex rattrayalex self-assigned this Nov 14, 2023
@c121914yu
Copy link

export async function text2Speech({
  res,
  onSuccess,
  onError,
  model = defaultAudioSpeechModels[0].model,
  voice = Text2SpeechVoiceEnum.alloy,
  input,
  speed = 1
}: {
  res: NextApiResponse;
  onSuccess: (e: { model: string; buffer: Buffer }) => void;
  onError: (e: any) => void;
  model?: string;
  voice?: `${Text2SpeechVoiceEnum}`;
  input: string;
  speed?: number;
}) {
  const ai = getAIApi();
  const response = await ai.audio.speech.create({
    model,
    voice,
    input,
    response_format: 'mp3',
    speed
  });

  const readableStream = response.body as unknown as NodeJS.ReadableStream;
  readableStream.pipe(res);

  let bufferStore = Buffer.from([]);

  readableStream.on('data', (chunk) => {
    bufferStore = Buffer.concat([bufferStore, chunk]);
  });
  readableStream.on('end', () => {
    onSuccess({ model, buffer: bufferStore });
  });
  readableStream.on('error', (e) => {
    onError(e);
  });
}

This is my example, it is a nextjs framework. I hope that will be helpful.

@juhana
Copy link

juhana commented Nov 16, 2023

Note that the Typescript types aren't correct when reading the response as a stream in Node. You have to do const stream = response.body as unknown as Readable; for it to not throw type errors.

@rattrayalex
Copy link
Collaborator

To fix those type errors, add import 'openai/shims/node' to the top of your file (details here) if you're on Node, or import 'openai/shims/web' if you're on anything else.

We're working to improve this.

@PetersonFonseca
Copy link

Hello everyone, can you please help me implement it on node? I can't make it work...

`import path from "path";
import OpenAI from "openai";

const openai = new OpenAI({
apiKey: process.env.OPENAI_SECRET_KEY,
});

const response = openai.audio.speech.create({
model: "tts-1",
voice: "onyx",
input: "Teste de texto para fala.",
});

response.stream_to_file(path.resolve("./speech.mp3"));`

@rattrayalex
Copy link
Collaborator

We don't provide a stream_to_file method; instead, use response.body.pipe(fs.createWriteStream(myPath)). Here's a complete example:

import OpenAI from 'openai';
import fs from 'fs';
import path from 'path';

// gets API Key from environment variable OPENAI_API_KEY
const openai = new OpenAI();

const speechFile = path.resolve(__dirname, './speech.mp3');

async function streamToFile(stream: NodeJS.ReadableStream, path: fs.PathLike) {
  return new Promise((resolve, reject) => {
    const writeStream = fs.createWriteStream(path).on('error', reject).on('finish', resolve);

    stream.pipe(writeStream).on('error', (error) => {
      writeStream.close();
      reject(error);
    });
  });
}

async function main() {
  const mp3 = await openai.audio.speech.create({
    model: 'tts-1',
    voice: 'alloy',
    input: 'the quick brown chicken jumped over the lazy dogs',
  });

  await streamToFile(mp3.body, speechFile);
}
main();

@yaozijun
Copy link

async function streamToFile(stream, path) {
    return new Promise((resolve, reject) => {
        const writeStream = fs.createWriteStream(path)
            .on('error', reject)
            .on('finish', resolve);

        stream.pipe(writeStream)
            .on('error', (error) => {
                writeStream.close();
                reject(error);
            });
    });
}
const ret= await openai.audio.speech.create({
                model: "tts-1",
                voice: "onyx",
                input: "test",
            });
const stream = ret.body;
const speechFile = path.resolve(`/xxx/test.mp3`);
await streamToFile(stream, speechFile);

@karar-shah
Copy link

text2Speech

@c121914yu
Could you kindly assist me in playing the audio stream on the client side?
Thank you.

@karar-shah
Copy link

We don't provide a stream_to_file method; instead, use response.body.pipe(fs.createWriteStream(myPath)). Here's a complete example:

import OpenAI from 'openai';
import fs from 'fs';
import path from 'path';

// gets API Key from environment variable OPENAI_API_KEY
const openai = new OpenAI();

const speechFile = path.resolve(__dirname, './speech.mp3');

async function streamToFile(stream: NodeJS.ReadableStream, path: fs.PathLike) {
  return new Promise((resolve, reject) => {
    const writeStream = fs.createWriteStream(path).on('error', reject).on('finish', resolve);

    stream.pipe(writeStream).on('error', (error) => {
      writeStream.close();
      reject(error);
    });
  });
}

async function main() {
  const mp3 = await openai.audio.speech.create({
    model: 'tts-1',
    voice: 'alloy',
    input: 'the quick brown chicken jumped over the lazy dogs',
  });

  await streamToFile(mp3.body, speechFile);
}
main();

@rattrayalex , I would like to inquire about the process of streaming the audio response on the Client component in Next.js. Despite searching for the past 1 day, I have been unable to find a solution.
Thank you so much for your help.

@c121914yu
Copy link

text2Speech

@c121914yu
Could you kindly assist me in playing the audio stream on the client side?
Thank you.

https://github.com/labring/FastGPT/blob/main/projects/app/src/web/common/utils/voice.ts

I haven't brought a computer with me recently, so I can't copy the code easily.

You can refer to my code for client streaming through fetch and MediaSource Api.

However, I have found that this api has some compatibility issues in apple products.

@athrael-soju
Copy link

text2Speech

@c121914yu
Could you kindly assist me in playing the audio stream on the client side?
Thank you.

https://github.com/labring/FastGPT/blob/main/projects/app/src/web/common/utils/voice.ts

I haven't brought a computer with me recently, so I can't copy the code easily.

You can refer to my code for client streaming through fetch and MediaSource Api.

However, I have found that this api has some compatibility issues in apple products.

You're gonna need polyfill for that

@aleksa-codes
Copy link

export async function text2Speech({
  res,
  onSuccess,
  onError,
  model = defaultAudioSpeechModels[0].model,
  voice = Text2SpeechVoiceEnum.alloy,
  input,
  speed = 1
}: {
  res: NextApiResponse;
  onSuccess: (e: { model: string; buffer: Buffer }) => void;
  onError: (e: any) => void;
  model?: string;
  voice?: `${Text2SpeechVoiceEnum}`;
  input: string;
  speed?: number;
}) {
  const ai = getAIApi();
  const response = await ai.audio.speech.create({
    model,
    voice,
    input,
    response_format: 'mp3',
    speed
  });

  const readableStream = response.body as unknown as NodeJS.ReadableStream;
  readableStream.pipe(res);

  let bufferStore = Buffer.from([]);

  readableStream.on('data', (chunk) => {
    bufferStore = Buffer.concat([bufferStore, chunk]);
  });
  readableStream.on('end', () => {
    onSuccess({ model, buffer: bufferStore });
  });
  readableStream.on('error', (e) => {
    onError(e);
  });
}

This is my example, it is a nextjs framework. I hope that will be helpful.

Can someone please help me. How do I use this or something similar to have an API route handler (endpoint) and call it from the frontend component in Next.js? I am basically trying to rebuild TTS fucntionallity that is in ChatGPT.

@daveycodez
Copy link

daveycodez commented Jun 10, 2024

Hey Aleksa, stumbled upon this because I'm building it myself. If you still need help... I re-wrote the above example as a simple API Route (pages router /api/voice.js)

import OpenAI from "openai";
const openai = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
})

export default async function handler(req, res) {
    const { input } = req.query
    res.setHeader('Content-Type', 'audio/mpeg')

    const response = await openai.audio.speech.create({
        model: "tts-1",
        voice: "alloy",
        input: input,
        response_format: 'mp3',
        speed: 1
    })

    const readableStream = response.body
    readableStream.pipe(res)

    let bufferStore = Buffer.from([])

    readableStream.on('data', (chunk) => {
        bufferStore = Buffer.concat([bufferStore, chunk])
    })

    readableStream.on('end', () => {
        // Store the mp3 somewhere if you want to reuse it
        // onSuccess({ model, buffer: bufferStore });
    })

    readableStream.on('error', (e) => {
        console.error(e)
    })
}

To play it locally from your client side, simple:

const input = "Today is a wonderful day to build something people love!"
new Audio(`/api/voice?input=${input}`).play()

@kifjj
Copy link

kifjj commented Oct 4, 2024

Hi LeakedDave and Alexa,

I implemented the simple route api example nodejs/express and hosted in several environments, google firebase, google app engine and while the streaming works, I observed a strange thing, basically the audio starts playing after 6s to 8s.

I tried many things on the servers (increase memory, move to a closer region) but no luck.

Any idea?

@daveycodez
Copy link

Hi LeakedDave and Alexa,

I implemented the simple route api example nodejs/express and hosted in several environments, google firebase, google app engine and while the streaming works, I observed a strange thing, basically the audio starts playing after 6s to 8s.

I tried many things on the servers (increase memory, move to a closer region) but no luck.

Any idea?

Honestly I’m not sure. If possible I would suggest to just host a NextJS API for this, I haven’t tested it with vanilla express at all. It sounds like your API doesn’t support streaming since 6-7 second wait would be the full audio I think.

@kifjj
Copy link

kifjj commented Oct 5, 2024

@LeakedDave oh, you are right Firebase functions and google app engine don't support streaming. Thanks for putting me on the right path.
Looking around I see AWS Lambda introduced streaming support 1 year ago but with some limitation (API Gateway, ALB not supported).

I tried and it works, stream starts in 3s or when cold start 5s. Much better user experience.

Here is the code if someone needs

  • I started from the aws lambda streaming example and tweek with my code just to make it work
  • be sure to select nodejs
  • be sure to increase the timeout, default is just 3s
  • for function url: 1) use AUTH TYPE: NONE if you want public access and 2) IMPORTANT select Invoke mode
    RESPONSE_STREAM otherwise won't stream :-)

`/* global fetch */
import util from 'util';
import stream from 'stream';
const { Readable } = stream;
const pipeline = util.promisify(stream.pipeline);

/* global awslambda */
export const handler = awslambda.streamifyResponse(async (event, responseStream, _context) => {

console.log("Query params" + event["queryStringParameters"]["text"]);

//console.log("event json: " + JSON.stringify(event));

const textToTTS = event["queryStringParameters"]["text"];

if (!textToTTS) {
console.log("no text to translate sent [" + textToTTS + "]");
return;
}

const rs = await fetch('https://api.openai.com/v1/audio/speech', {
method: 'POST',
headers: {
Authorization: 'Bearer ' + okey,
'Content-Type': 'application/json',
},
body: JSON.stringify({
input: textToTTS,
model: 'tts-1',
response_format: 'mp3',
voice: 'echo',
}),
});

await pipeline(rs.body, responseStream);
});
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests