-
-
Notifications
You must be signed in to change notification settings - Fork 797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pyannote/speaker-diarization-3.0 slower than pyannote/speaker-diarization? #1481
Comments
There was indeed an issue related to I benchmarked both myself and got similar speed between 3.0 and 2.1 (even slightly faster with 3.0). For instance, on DIHARD 3, v3.0 is 43x faster than realtime while v2.1 is only 40x faster.
Duration = total duration of audio in benchmark. |
I am noticing the same with v3.0.1. Diarization inference takes about 10 minutes for a 25 minute audio file, running on an A40 (hosted at Replicate). These are the logs of my pipeline:
So it takes about 1.5 minutes for the transcribing (with faster_whisper) and almost 10 minutes for running pyannote. This is how I am setting it up: class Predictor(BasePredictor):
def setup(self):
"""Load the model into memory to make running multiple predictions efficient"""
model_name = "large-v2"
self.model = WhisperModel(
model_name,
device="cuda" if torch.cuda.is_available() else "cpu",
compute_type="float16")
self.diarization_model = Pipeline.from_pretrained(
"pyannote/speaker-diarization-3.0", use_auth_token="TOKEN").to(torch.device("cuda")) And calling the pipeline: def predict() -> Output:
# other code
diarization = self.diarization_model(
audio_file_wav, num_speakers=num_speakers)
# other code File I am using is https://thomasmol.com/recordings/aiopensource.mp3 Maybe its still not loaded to the GPU correctly? |
Related maybe? SYSTRAN/faster-whisper#493 |
@thomasmol when you start the script, try doing this:
If it returns CPU, it is probably due to you using faster_whisper which has onnxruntime in its requirements, when you install onnxruntime (from faster_whisper) and onnxruntime-gpu (from pyannote), they get conflicted and always default to CPU. You should change the requirements of faster_whisper to use onnxruntime-gpu, it will not affect faster_whisper's behaviour, I've tested it and it works fine, since it only uses onnx for the Silero-VAD. You can also do In addition, to avoid conflicts with .mp3 files, you must use torchaudio to load the file into memory
|
Thanks @guilhermehge, that might be the problem, but how do I force faster-whisper to use onnxruntime-gpu? I am building a with a dockerfile, which will just install faster-whisper and all of its required dependencies (and therefore install onnxruntime). See SYSTRAN/faster-whisper#493 (comment) |
Installing it and uninstalling it afterwards is not recommended, so, I believe you should clone faster_whisper's repository to your machine, change the requirements, copy the directory when running the application into the container and build the package there, then install it. I haven't tested this yet, but it might work.
OR, you can just build the package in your local machine and install it inside the container, you will just need to alter the order of the commands above. Edit: I just thought of a better way of doing this. You clone faster_whisper's repo to your machine, change the requirements, build the package with What I did to test if it works was (even though not recommended) uninstall onnxruntime and force reinstall onnxruntime-gpu. You can also try that for a initial step, if you may.
|
I can also +1 this |
Okay the issue seems to be just on faster-whispers end, and the issue is indeed regarding I think this issue can be closed since pyannote 3.0.1 runs as fast (maybe faster?) as 2.1, just make sure you only have |
So, the fix I mentioned here worked properly. Just clone the repo, change the requirements, run the setup.py and install the .whl file, everything should run as normal. |
I guess kind of a separate issue but assuming the pipeline depends on onxxgpu wouldn't this cause compatibility issues for CUDA 12+ since onnx-gpu needs cuda 11 |
Sorry for the late update from my side. After upgrading to 3.0.1, inference speed is fast as reported by @hbredin. Thank you for the fix. You can close the issue. |
Awesome. Thanks for the feedback! |
FYI: #1537 |
Also running into this issue. Things I was using/trying: Approach 1: Replicate A40sUsed Cuda 11.8 & Approach 2: Custom docker image with runpodUsed Cuda 12 & I stripped away all other packages to try to isolate the issue but unfortunately cannot reproduce the performance claimed above. Maybe someone has some hints? |
Latest version no longer relies on ONNX runtime. |
I found in my case that doing as @guilhermehge suggested with regards to a forced refresh worked for me
It gave me an ugly warning but my set up seems to work:
Went from 310 seconds (i5 13600k CPU) for a test file back down to 31 seconds on GPU (RTX-3090) |
I notice that
'pyannote/speaker-diarization-3.0'
is quite slower than'pyannote/speaker-diarization'
, even with the GPU fix. Does anyone observe the same phenomenon? I will get some sample benchmark code when I have time.Originally posted by @gau-nernst in #1475 (comment)
The text was updated successfully, but these errors were encountered: