How to turn the received audio through the `AudioReceiveHandler` into mono and little-endian? #1857

moeux · 2021-09-29T20:22:26Z

moeux
Sep 29, 2021

Question

Hi, I've been struggling with this for a few days, with no answers from the Discord server nor StackOverflow:
How can I turn the audio received through the AudioReceiveHandler into little-endian and mono?

I'm using the Vosk API for speech recognition and their method to feed data into it only accepts byte, float and short arrays.
I've discovered the audio format in the AudioReceiveHandler which prints out this value: PCM_SIGNED 48000.0 Hz, 16 bit, stereo, 2 bytes/frame, big-endian and was told that vosk only accepts mono and little-endian in order to work.

Example Code

I've tried a few methods.

Converting it using AudioSystem.

private void analyseAudio() {
            // Desired audioFormat, has to have only one channel (mono) and be little endian (false)
            var audioFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 48000, 16, 1, 4, 48000, false);
            // The queue here is the same queue used in the AudioEchoExample.java, therefore returns a byte array
            try (InputStream audioStream = AudioSystem.getAudioInputStream(audioFormat, AudioSystem.getAudioInputStream(new ByteArrayInputStream(queue.poll())))) {
                // Analyse
            } catch (IOException | UnsupportedAudioFileException e) {
                e.printStackTrace();
            }
        }

Exception

javax.sound.sampled.UnsupportedAudioFileException: Stream of unsupported format
    at java.desktop/javax.sound.sampled.AudioSystem.getAudioInputStream(AudioSystem.java:1014)
    at me.moeux.oratio.commands.SpeakCommand$SpeechRecognizer.analyseAudio(SpeakCommand.java:101)
    at me.moeux.oratio.commands.SpeakCommand$SpeechRecognizer.handleUserAudio(SpeakCommand.java:85)
    at net.dv8tion.jda.internal.audio.AudioConnection.lambda$setupReceiveThread$3(AudioConnection.java:425)
    at java.base/java.lang.Thread.run(Thread.java:831)

I've tried messing with the ByteBuffer. And tried to turn the bytes into shorts after I had seen how the OpusPacket#getAudioData() worked, tried to reverse that step so to speak.

private short[] convertToRightFormat(byte[] data) {
        ByteBuffer byteBuffer = ByteBuffer.wrap(data);
        short[] pcm = new short[data.length / 2];

        int j = 0;
        for (int i = 0; i < data.length; i += 2)
            pcm[j++] = (short) (data[i] << 8 | data[i + 1] & 0xFF);

        byteBuffer.order(ByteOrder.BIG_ENDIAN);
        for (short s : pcm) byteBuffer.putShort(s);
        byteBuffer.order(ByteOrder.LITTLE_ENDIAN);
        byteBuffer.rewind();

        for (int i = 0; i < pcm.length; i++)
            pcm[i] = byteBuffer.getShort();

        return pcm;
    }

After digging around even more I tried using encoded audio and OpusPacket#decode() directly.

@Override
    public void handleEncodedAudio(@NotNull OpusPacket packet) {
        if (packet.getUserId() != userId && packet.canDecode())
            return;

        short[] decoded = packet.decode();

        if (decoded != null) {
            recognizer.acceptWaveForm(decoded, decoded.length);
        }

        if (task != null)
            task.cancel(true);
        if (task == null || task.isCancelled() || task.isDone())
            task = scheduler.schedule(this::sendResult, 5, TimeUnit.SECONDS);
    }

No exception on both tries but vosk was still returning empty results, a sign of not being the right format.

I really hope you guys can help me out. Thank you!

DV8FromTheWorld · 2021-10-04T16:20:46Z

DV8FromTheWorld
Oct 4, 2021
Maintainer

You should post your VOSK code too. I'd also suggest looking at Sphinx as a possible alternative.
You can use FFMPEG to convert trivially between audio formats if needed. There are FFMPEG wrappers out there that will do the heavily lifting for you of integrating it into your Java code.

You need to track down the exact format expected by VOSK:

How much data does it want per frame? 2 bytes? 4 bytes?
What hz does it expect?
Should the PCM be signed or unsigned?
Does it expect 16bit audio?

Honestly the lack of documentation for this library kinda sucks..

4 replies

SIMULATAN Oct 4, 2021

How would this work in Sphinx? Been trying to use it but I have the same issue, it would require this format: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 8000 / 16000 Hz (idk if unsigned or not)

DV8FromTheWorld Oct 6, 2021
Maintainer

This may prove helpful https://docs.oracle.com/javase/tutorial/sound/converters.html
I would advise reading and re-reading this document for clarity on how audio conversions happen in Java.

DV8FromTheWorld Oct 6, 2021
Maintainer

To move from BigEndian to LittleEndian you need to flip the order of the bytes.
Since we output stereo 16bit big endian I think you'd want to iterate through the bytes provided and:

accept byte 1
accept byte 2
throw away the next 2 bytes (dropping the right channel, thus becoming mono)
swap the order of byte 1 and 2

Now you have 1 frame of 16bit PCM data in little endian, if memory serves.

I'm not sure how to step down from 48khz to 16khz. It seems possible that Java can do this (and also the little endian conversion aspect) for you, but I'm not positive without looking into it again.
I know that FFMPEG definitely can take in a defined audio stream and convert it however you'd like.

DV8FromTheWorld Oct 6, 2021
Maintainer

To make the Java audio system be able to do some of these conversions back when JDA supported it itself we had to include Java Audio SPIs. You may have to do similar things if you want to stick within Java instead of piping it out of something like FFMPEG.

The Tritonus libraries (and a few others) were what we used for the audio SPIs:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to turn the received audio through the `AudioReceiveHandler` into mono and little-endian? #1857

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

How to turn the received audio through the AudioReceiveHandler into mono and little-endian? #1857

moeux Sep 29, 2021

Question

Example Code

Replies: 1 comment · 4 replies

DV8FromTheWorld Oct 4, 2021 Maintainer

SIMULATAN Oct 4, 2021

DV8FromTheWorld Oct 6, 2021 Maintainer

DV8FromTheWorld Oct 6, 2021 Maintainer

DV8FromTheWorld Oct 6, 2021 Maintainer

How to turn the received audio through the `AudioReceiveHandler` into mono and little-endian? #1857

moeux
Sep 29, 2021

Replies: 1 comment 4 replies

DV8FromTheWorld
Oct 4, 2021
Maintainer

DV8FromTheWorld Oct 6, 2021
Maintainer

DV8FromTheWorld Oct 6, 2021
Maintainer

DV8FromTheWorld Oct 6, 2021
Maintainer