Query Regarding MP3 Format Support and Accuracy in Silero-VAD #395

Jellun · 2023-11-15T13:07:18Z

Jellun
Nov 15, 2023

I've noticed that all your example code for silero-vad uses only WAV format. However, I tested the model using an MP3 file:

audio = read_audio("test.mp3", sampling_rate=SAMPLING_RATE)

The model operates correctly with MP3 files, but there appears to be a reduction in accuracy compared to WAV format. It's important to mention that the MP3 files I tested were high quality (i.e., not excessively compressed).

Could you clarify whether silero-vad inherently supports MP3 format? Also, is there a noticeable difference in quality or accuracy between MP3 and WAV formats?

Thanks in advance.
Jun

Answered by snakers4

Nov 15, 2023

read_audio("audio_file_path", sampling_rate=SAMPLING_RATE) use FFMPEG or some other Python modules to load the file? Does it automatically detect the audio file format and convert it to wav before further processing it within the model?

I believe it uses torchaudio, which has a sox_io backend now by default, which uses sox. Not sure about MP3 support in sox, it is always flaky with MP3.

View full answer

snakers4 · 2023-11-15T13:26:31Z

snakers4
Nov 15, 2023
Maintainer

Could you clarify whether silero-vad inherently supports MP3 format? Also, is there a noticeable difference in quality or accuracy between MP3 and WAV formats?

To be honest - we did not check this.

We did not explicitly add MP3 (to and back) augmentations to the training process.
Typically most of source data is in WAV / OGG-VORBIS / OGG-OPUS.
We store data in OGG-OPUS.
High quality MP3 files provide quite good quality, but we did not explicitly test for which bitrate there is which quality degradation.

2 replies

Jellun Nov 15, 2023
Author

Does method read_audio("audio_file_path", sampling_rate=SAMPLING_RATE) use FFMPEG or some other Python modules to load the file? Does it automatically detect the audio file format and convert it to wav before further processing it within the model?

snakers4 Nov 15, 2023
Maintainer

read_audio("audio_file_path", sampling_rate=SAMPLING_RATE) use FFMPEG or some other Python modules to load the file? Does it automatically detect the audio file format and convert it to wav before further processing it within the model?

I believe it uses torchaudio, which has a sox_io backend now by default, which uses sox. Not sure about MP3 support in sox, it is always flaky with MP3.

Answer selected by Jellun

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query Regarding MP3 Format Support and Accuracy in Silero-VAD #395

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Query Regarding MP3 Format Support and Accuracy in Silero-VAD #395

Jellun Nov 15, 2023

Replies: 1 comment · 2 replies

snakers4 Nov 15, 2023 Maintainer

Jellun Nov 15, 2023 Author

snakers4 Nov 15, 2023 Maintainer

Jellun
Nov 15, 2023

Replies: 1 comment 2 replies

snakers4
Nov 15, 2023
Maintainer

Jellun Nov 15, 2023
Author

snakers4 Nov 15, 2023
Maintainer