Skip to content

How to synthesize a large file into audio files

ForrestGumb edited this page Aug 22, 2024 · 4 revisions

Limit of one synthesis request

The Microsoft cognitive text-to-speech service has some limits, e.g.

  • The request SSML cannot be more than 10 minute audio
  • The voice elements in a single request should less or equal than 50

In some scenarios where we want to synthesize a long paragraph into a single audio file. We can use the speech SDK to solve this problem.

Here is how:

Option 1: Use batch synthesis API

Batch synthesis API is the recommended solution to generate large audio file. For details, see Batch synthesis API for text to speech.
You can find sample code here.

Option 2: Use real time synthesis API

Firstly, create an audioConfig using AudioConfig.FromWavFileOutput, based on which, create a synthesizer. Then call speak method many times with shorter sentences, the generated audio for multi speaks will be saved in a single audio file.

The below example does in this way:

  1. split the text file into pararaph using by \n or \r. This is because the real time endpoint has a limit of 10 min audio.

  2. call SDK to synthesize one by one into the same mp3 file. It has some retrying when the synthesis fails for one paragraph.

     public static void SynthesisSsmlToMp3File(string voiceName, string style, string[] paragraphs, string file)
     {
         var config = SpeechConfig.FromSubscription("Your key", "you region");
    
         // Sets the synthesis output format.
         // The full list of supported format can be found here:
         // https://docs.microsoft.com/azure/cognitive-services/speech-service/rest-text-to-speech#audio-outputs
         // config.SetSpeechSynthesisOutputFormat((SpeechSynthesisOutputFormat)Enum.Parse(typeof(SpeechSynthesisOutputFormat), codec));
         config.SpeechSynthesisVoiceName = voiceName;
         config.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Audio24Khz96KBitRateMonoMp3);
    
         // Creates a speech synthesizer using file as audio output.
         // Replace with your own audio file name.
         System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
         sw.Start();
         var fileName = voiceName + ".mp3";
         using (var fileOutput = AudioConfig.FromWavFileOutput(file))
         using (var synthesizer = new SpeechSynthesizer(config, fileOutput))
         {
             foreach (string pargraph in paragraphs)
             {
                 var ssml = GenerateSsml(voiceName, pargraph, style);
    
                 int retry = 3;
                 while (retry > 0)
                 {
                     using (var result = synthesizer.SpeakSsmlAsync(ssml).Result)
                     {
                         if (result.Reason == ResultReason.SynthesizingAudioCompleted)
                         {
                             Console.WriteLine($"success on {voiceName}{ssml} {result.ResultId} in {sw.ElapsedMilliseconds} msec");
                             break;
                         }
                         else if (result.Reason == ResultReason.Canceled)
                         {
                             Console.WriteLine($"failed on {voiceName}{ssml} {result.ResultId}");
                             var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
                             Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
    
                             if (cancellation.Reason == CancellationReason.Error)
                             {
                                 Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                                 Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");
                                 Console.WriteLine($"CANCELED: Did you update the subscription info?");
                             }
                         }
    
                         retry--;
                         Console.WriteLine("retrying again...");
                     }
                 }
             }
         }
     }
    
Clone this wiki locally