WhatsApp Voice Waveform Investigations #432

deuill · 2023-07-21T14:28:39Z

deuill
Jul 21, 2023

Hey all, opening a thread here to track investigations on the appropriate way to generate a waveform for voice messages, as accepted in the proto.AudioMessage.Waveform type field.

From prior investigations, and from parsing incoming messages, it seems that the value expected here is a []byte slice of 64 elements, each of which is a number between 0 - 100 (so, ostensibly, a percentage). We can get something approximating this with ffprobe, first by getting peak levels for the audio file in decibel:

ffprobe -v error -f lavfi -i "amovie=test.mp3,asetnsamples=44100,astats=metadata=1:reset=1" -show_entries frame=pkt_pts_time:frame_tags=lavfi.astats.Overall.Peak_level -of "csv=nokey=1:print_section=0"

Assuming the input file, test.mp3, has a sample-rate of 44100Hz, this will return a number of dB readings in their own lines, one per second (formatting issues are secondary to this investigation, but worth noting here).

If we want ffprobe to return exactly 64 readings, we need to modify the setnsamples option to a number equal to the number of seconds in the audio stream, times the sample-rate, divided by 64. So, for an audio file of 10.34 seconds and a sample-rate of 48000, we'd get (10.34 * 48000) / 64 = 7740, which we could then plug in above to get 64 rather than 10 values.

However, we then need to scale the dB values to a percentage. It seems that the correct formula for this is $k = 10^\frac{x}{10} $, but this does not seem to be what WhatsApp themselves are using, as the values are biased towards the low end (e.g. -12dB comes out as 6% with this formula, but WhatsApp seems to assign something closer to 70%).

Raising the denominator in the power from doesn't seem to be a perfect solution either -- a value of, say, 50, produces values closer to what WhatsApp does, but not entirely the same, and is still biased in some ways. It's likely that the formula used in converting between dB and linear (or is it not linear?) is different.

Anyone have a better idea of what the underlying behaviour might be here?

ashuvax · 2024-07-03T14:23:03Z

ashuvax
Jul 3, 2024

something new?

0 replies

brnostone · 2024-10-04T18:56:37Z

brnostone
Oct 4, 2024

I used the gopxl/beep lib to create a Waveform and it worked

import (
	"math"
	"os"

	"github.com/gopxl/beep/v2/mp3"
)


func generateWaveform(filePath string) ([]byte, error) {
	f, err := os.Open(filePath)
	if err != nil {
		return nil, err
	}
	defer f.Close()

	streamer, _, err := mp3.Decode(f)
	if err != nil {
		return nil, err
	}
	defer streamer.Close()

	const numSamples = 64
	samples := make([]float64, 0)
	buf := make([][2]float64, 1024)

	// Converting stereo to mono
	for {
		n, ok := streamer.Stream(buf)
		if !ok {
			break
		}
		for i := 0; i < n; i++ {
			sample := (buf[i][0] + buf[i][1]) / 2
			samples = append(samples, sample)
		}

	}

	// Split samples into blocks to generate 64 values
	blockSize := len(samples) / numSamples
	filteredData := make([]float64, numSamples)

	var maxAmplitude float64 = 0
	for i := 0; i < numSamples; i++ {
		start := i * blockSize
		end := start + blockSize
		if end > len(samples) {
			end = len(samples)
		}

		// Calculate the average amplitude in the block
		var sum float64
		for j := start; j < end; j++ {
			sum += math.Abs(samples[j])
		}
		avg := sum / float64(blockSize)

		filteredData[i] = avg
		if avg > maxAmplitude {
			maxAmplitude = avg
		}
	}

	// Normalize data based on maximum value
	normalizedData := make([]byte, numSamples)
	for i := 0; i < numSamples; i++ {
		if maxAmplitude != 0 {
			normalizedData[i] = byte((filteredData[i] / maxAmplitude) * 100)
		} else {
			normalizedData[i] = 0
		}
	}

	return normalizedData, nil
}

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WhatsApp Voice Waveform Investigations #432

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

WhatsApp Voice Waveform Investigations #432

deuill Jul 21, 2023

Replies: 2 comments

ashuvax Jul 3, 2024

brnostone Oct 4, 2024

deuill
Jul 21, 2023

ashuvax
Jul 3, 2024

brnostone
Oct 4, 2024