How to synthesize a large file into audio files

Limit of one synthesis request

The Microsoft cognitive text-to-speech service has some limits, e.g.

The request SSML cannot be more than 10 minute audio
The voice elements in a single request should less or equal than 50

In some scenarios where we want to synthesize a long paragraph into a single audio file. We can use the speech SDK to solve this problem.

Here is how:

Option 1: Use batch synthesis API

Batch synthesis API is the recommended solution to generate large audio file. For details, see Batch synthesis API for text to speech.
You can find sample code here.

Option 2: Use real time synthesis API

Firstly, create an audioConfig using AudioConfig.FromWavFileOutput, based on which, create a synthesizer. Then call speak method many times with shorter sentences, the generated audio for multi speaks will be saved in a single audio file.

The below example does in this way:

split the text file into pararaph using by \n or \r. This is because the real time endpoint has a limit of 10 min audio.

call SDK to synthesize one by one into the same mp3 file. It has some retrying when the synthesis fails for one paragraph.

 public static void SynthesisSsmlToMp3File(string voiceName, string style, string[] paragraphs, string file)
 {
     var config = SpeechConfig.FromSubscription("Your key", "you region");

     // Sets the synthesis output format.
     // The full list of supported format can be found here:
     // https://docs.microsoft.com/azure/cognitive-services/speech-service/rest-text-to-speech#audio-outputs
     // config.SetSpeechSynthesisOutputFormat((SpeechSynthesisOutputFormat)Enum.Parse(typeof(SpeechSynthesisOutputFormat), codec));
     config.SpeechSynthesisVoiceName = voiceName;
     config.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Audio24Khz96KBitRateMonoMp3);

     // Creates a speech synthesizer using file as audio output.
     // Replace with your own audio file name.
     System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
     sw.Start();
     var fileName = voiceName + ".mp3";
     using (var fileOutput = AudioConfig.FromWavFileOutput(file))
     using (var synthesizer = new SpeechSynthesizer(config, fileOutput))
     {
         foreach (string pargraph in paragraphs)
         {
             var ssml = GenerateSsml(voiceName, pargraph, style);

             int retry = 3;
             while (retry > 0)
             {
                 using (var result = synthesizer.SpeakSsmlAsync(ssml).Result)
                 {
                     if (result.Reason == ResultReason.SynthesizingAudioCompleted)
                     {
                         Console.WriteLine($"success on {voiceName}{ssml} {result.ResultId} in {sw.ElapsedMilliseconds} msec");
                         break;
                     }
                     else if (result.Reason == ResultReason.Canceled)
                     {
                         Console.WriteLine($"failed on {voiceName}{ssml} {result.ResultId}");
                         var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
                         Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

                         if (cancellation.Reason == CancellationReason.Error)
                         {
                             Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                             Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");
                             Console.WriteLine($"CANCELED: Did you update the subscription info?");
                         }
                     }

                     retry--;
                     Console.WriteLine("retrying again...");
                 }
             }
         }
     }
 }

Azure TTS: Empower every person and every organization on the planet to have a delightful digital voice!
Azure Custom Voice: Build your one-of-a-kind Custom Voice and close to human Neural TTS in cloud and edge!

Azure Speech Document

Create Custom Neural Voice

Speech SDK

Azure Speech Containers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to synthesize a large file into audio files

Limit of one synthesis request

Here is how:

Option 1: Use batch synthesis API

Option 2: Use real time synthesis API

Clone this wiki locally