-
Hi @snakers4 , I've noticed that all your example code for silero-vad uses only WAV format. However, I tested the model using an MP3 file: audio = read_audio("test.mp3", sampling_rate=SAMPLING_RATE) The model operates correctly with MP3 files, but there appears to be a reduction in accuracy compared to WAV format. It's important to mention that the MP3 files I tested were high quality (i.e., not excessively compressed). Could you clarify whether silero-vad inherently supports MP3 format? Also, is there a noticeable difference in quality or accuracy between MP3 and WAV formats? Thanks in advance. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
To be honest - we did not check this. We did not explicitly add MP3 (to and back) augmentations to the training process. |
Beta Was this translation helpful? Give feedback.
I believe it uses torchaudio, which has a
sox_io
backend now by default, which uses sox. Not sure about MP3 support in sox, it is always flaky with MP3.