Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run with onnxruntime-gpu not working for faster_whisper #493

Closed
guilhermehge opened this issue Sep 27, 2023 · 23 comments
Closed

Run with onnxruntime-gpu not working for faster_whisper #493

guilhermehge opened this issue Sep 27, 2023 · 23 comments

Comments

@guilhermehge
Copy link

I am trying to use faster_whisper with pyannote for speech overlap detection and speaker diarization, but the pyannote's new update 3.0.0, it will need onnxruntime-gpu to run the diarization pipeline with the new embedding model.

Installing both onnxruntime (from faster_whisper) and onnxruntime-gpu (from pyannote), causes a conflict and onnx redirects to CPU only.

I tried uninstalling onnxruntime and forcing the reinstall of onnxruntime-gpu and faster_whisper is no longer working.

Is it possible to use onnxruntime-gpu for faster_whisper?

@phineas-pta
Copy link

u should not have both onnxruntime and onnxruntime-gpu, it always default to cpu

installing onnxruntime-gpu alone should be enough, faster_whisper uses it for silero VAD but always cpu https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/vad.py#L260

the caveat with onnxruntime-gpu is u must properly install cuda + cudnn at system level

@guilhermehge
Copy link
Author

guilhermehge commented Sep 27, 2023

But can silero vad run with onnxruntime-gpu? To do that I believe I might need to change the requirements of faster whisper so it does not install onnxruntime, right?

I'm running the application on docker with the following image: nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04, so cuda + cudnn are properly installed

@phineas-pta
Copy link

it's possible to run silero vad with onnxruntime-gpu, see my comment #364 (comment)

idk u using what version of onnxruntime but for latest version better use cuda 11.8 https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements

@guilhermehge
Copy link
Author

Thanks for that, phineas!

Let me ask you something else. Faster_whisper's transcribe is already taking up 99% of my GPU, if I run VAD on GPU as well, would it be a problem or would it take longer due to that? I read through transcribe.py and I see that SileroVAD is only used within the transcribe function and the segments are a generator, so it should not overload the GPU. Am I correct?

@guilhermehge
Copy link
Author

guilhermehge commented Sep 27, 2023

I implemented this code of yours from # 364 (comment)

and it actually increased the transcribe function time, going from 2 to 7 seconds for an audio that i'm testing. Do you know why that happened?

Analyzing it further I believe that happens because it creates a session everytime we call the transcribe function, so, since it is using GPU, it increases session creation time.

@phineas-pta
Copy link

hmm seem like i misread your previous comment, silero vad should work with onnxruntime-gpu, default to cpu, my code is just a tweak to make it work on gpu but not absolute necessity

it always create new onnx session no matter gpu or cpu, but take more time to load to gpu i guess (loading time > processing time), maybe need a longer audio to test for actual speed up

@guilhermehge
Copy link
Author

Yes, at first I did want to run it with the onnxruntime-gpu library but using the Silero VAD on CPU, but since you posted the code, I tried running it on the GPU, but session time increases the time too much for small audios, so it's not worth it in most cases, better to use CPU with more threads active.

I'm trying to run this code along with pyannote's 3.0 diarization pipeline, which requires onnxruntime-gpu, so faster_whisper's requirements were causing a conflict.

I'm using a docker container in a pod with GPU orchestrated by kubernetes, there I'm building an image based on nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04. I created this issue because I was testing the onnxruntime-gpu in this environment inside a jupyter-notebook but the kernel kept dying when trying to run inference with whisper, and I couldn't figure out why, but then, later, I ran a .py complete code outside jupyter and it worked fine. I still don't know why the jupyter notebook kernel keeps dying with this library.

@phineas-pta
Copy link

should had shared the config info since the beginning to avoid talking to nowhere 😅

so the actual problem is jupyter kernel crash, u have logs ?

@guilhermehge
Copy link
Author

guilhermehge commented Sep 28, 2023

I'll be running some tests and I'll comeback here with the results. For now, I don't have any logs, I killed the pod before accessing them.

Edit: I'll only be able to touch this issue again next week. When I get the results, I will post them here.

@thomasmol
Copy link

Hi, I am having the same issue: I need to run onnxruntime-gpu but I can't easily uninstall the cpu version since I am using Cog and pushing it to Replicate. Meaning I can't change the code as per #364 (comment) , or I don't know how at least. Any ideas how to force my build to use onnxruntime-gpu and remove onnxruntime?

@thomasmol
Copy link

I created a pull request that fixes this issue: #499. You can try it by importing git+https://github.com/thomasmol/faster-whisper.git@master.

@phineas-pta
Copy link

your PR is very likely to be rejected, it only works with nvidia gpu, meanwhile faster-whisper is cross-platform, that's why my code snippet just stay as it is instead of send PR

@thomasmol
Copy link

Thanks for the heads up

@guilhermehge
Copy link
Author

guilhermehge commented Sep 30, 2023

I don't recommend running silero vad on GPU either, since it takes longer to instantiate a session than the CPU version. For shorter audios, it increases the overall time significantly. I've had had 2s on CPU versus 7s on GPU for certain audios.

Perhaps it's possible we add an option for the user to select GPU or CPU for silero vad, using the parameters class.

@guilhermehge
Copy link
Author

So, for this issue, @phineas-pta, I fixed it by installing only onnxruntime-gpu, the jupyter notebook is working properly and everything is running as it should be.

To do this, I cloned whisper repo, created a build with only the onnxruntime-gpu version and installed it, now everything is running normally. Thanks for the help.

@thomasmol
Copy link

thomasmol commented Oct 3, 2023

@guilhermehge yes I did the same at works! Maybe we could create a fork faster-whisper-gpu and have a gpu only version?

@remic33
Copy link

remic33 commented Oct 10, 2023

It seems that the current pyannote version (3.0.1) is not working with the current faster_whisper version. Any idea solution on that?

@guilhermehge
Copy link
Author

It is. I am using it atm. How are you implementing it? Docker? Colab? Locally w/o docker?

@remic33
Copy link

remic33 commented Oct 10, 2023

Locally. should be the problem I guess. Wanted to update whisperX on that matter

@guilhermehge
Copy link
Author

Did you create a virtual environment to do that?

Can you further explain your problem so we can debug it?

@remic33
Copy link

remic33 commented Oct 11, 2023

Its is a local env made with conda, with m2 silicon.
Env had whisperX install previously, trying to build it with new piannote version send me an error :

ERROR: Could not find a version that satisfies the requirement onnxruntime-gpu>=1.16.0 (from pyannote-audio) (from versions: none)
ERROR: No matching distribution found for onnxruntime-gpu>=1.16.0

@phineas-pta
Copy link

@remic33 pyannote dont officially support mac, there's already many issues on pyannote repo about that

@remic33
Copy link

remic33 commented Oct 12, 2023

It worked previously, I know it because I was using it and I was part of those discussions. You just needed to add some packages. But maybe with onnx gpu it do not anymore.
Thanks for your help !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants