WhatsApp Voice Waveform Investigations #432
Unanswered
deuill
asked this question in
WhatsApp protocol Q&A
Replies: 2 comments
-
something new? |
Beta Was this translation helpful? Give feedback.
0 replies
-
I used the gopxl/beep lib to create a Waveform and it worked
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey all, opening a thread here to track investigations on the appropriate way to generate a waveform for voice messages, as accepted in the
proto.AudioMessage.Waveform
type field.From prior investigations, and from parsing incoming messages, it seems that the value expected here is a
[]byte
slice of 64 elements, each of which is a number between 0 - 100 (so, ostensibly, a percentage). We can get something approximating this withffprobe
, first by getting peak levels for the audio file in decibel:Assuming the input file,
test.mp3
, has a sample-rate of 44100Hz, this will return a number of dB readings in their own lines, one per second (formatting issues are secondary to this investigation, but worth noting here).If we want
ffprobe
to return exactly 64 readings, we need to modify thesetnsamples
option to a number equal to the number of seconds in the audio stream, times the sample-rate, divided by 64. So, for an audio file of 10.34 seconds and a sample-rate of 48000, we'd get(10.34 * 48000) / 64 = 7740
, which we could then plug in above to get 64 rather than 10 values.However, we then need to scale the dB values to a percentage. It seems that the correct formula for this is$k = 10^\frac{x}{10} $ , but this does not seem to be what WhatsApp themselves are using, as the values are biased towards the low end (e.g. -12dB comes out as 6% with this formula, but WhatsApp seems to assign something closer to 70%).
Raising the denominator in the power from doesn't seem to be a perfect solution either -- a value of, say, 50, produces values closer to what WhatsApp does, but not entirely the same, and is still biased in some ways. It's likely that the formula used in converting between dB and linear (or is it not linear?) is different.
Anyone have a better idea of what the underlying behaviour might be here?
Beta Was this translation helpful? Give feedback.
All reactions