-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pydub.AudioSegment clashes with onnxruntime and causes extensive "slow" warning logs (~1s) #486
Comments
Looks like a nice workaround)
Can you post the full log printed by pydub / onnxruntime? Also it is interesting to benchmark the model loading times for TorchScript vs ONNX, since this edge case arises. There is almost no speed difference for v5, only the engine is larger for PyTorch. Also another workaround may be to use some lighter library with a steaming audio support.
Is it really a problem, if you load the VAD once (now you load VAD on each API call)? With FastAPI there are definitely are options to pre-load some long-living object after starting the API. Only beware that you should start a separate VAD instance per FastAPI process, otherwise it may produce errors and the VAD state may be tainted (since the object would be shared across the processes / threads, which is bad). Ideally there should be 1 VAD instance per each process / thread in FastAPI. I am not sure how to implement it properly via FastAPI. The 100% correct way is to use the VAD via some message queue / executor, but it is not very minimal anymore. |
Can you please post the log? |
|
I am simulating a realtime vad for phone call,it's a demo.Since this is a heavy procedure and the silero model is stateful, I think load the VAD on each API call is the easiest way and acceptable. And yes I should optimize this. |
Interesting. Which onnx-runtime version are you running? |
Actually I am not familiar with javascript,and a fresh man of python,and I have try few other libraries to simulate real time speech audio streaming processing and fail. The code I post is the final version that make it work,and to me it work well. Maybe if I change the collecting way on webpage it can pass only wav to backend then I can process it through a lighter python libraries. But the html code you see it's the only version work from all the version chatgpt tell me. |
onnx 1.14.1 |
Yes,actually this is not a problem anymore. But just curiosity why the onnxrumetime can affect pydub,does it have anything do with silero model? |
The only assumption - sets some logging flags, but why it affects onnxruntime eludes me. |
Spend some time trying to export clean onnx model without unused graph nodes:
import onnxruntime
sess_options = onnxruntime.SessionOptions()
# Set graph optimization level
sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_EXTENDED
# To enable model serialization after graph optimization set this
sess_options.optimized_model_filepath = "VADr_v5_opt.onnx"
session = onnxruntime.InferenceSession("VADr_v5.onnx", sess_options) This method actually deletes unused graph nodes, but increases model size by 1Mb
We don't want to increase the size of the model. But you can use first method locally, hope it will help to resolve this bug. |
Will close for now, since it is not a problem and an easy workaround was found, proper fixes are a bit hard to do. |
🐛 Bug
#485 This is the problem I run into.And I find out the cause:If I involve
AudioSegment.from_file
api during onnxruntime printing warning log, it will raise an error what mention in #485To Reproduce
Steps to reproduce the behavior:
I write a simple code to reproduce the problem.
The code I test on windows10,linux will be ok I think.
The code script is on below:
1.Run this code with v5 model on a computer with microphone.And the enter the webpage of http://127.0.0.1:8844/v1/demo/microphone
2.Refresh the webpage and click the Start Recording button ASAP
3.Wait about 1s, the problem will be appear (If not ,maybe you don't click the Start Recording button fast enough after you refresh the webpage).
4.Add this line of code on the top of the script
onnxruntime.set_default_logger_severity(3)
and then no matter how fast I try I can't reproduce the problem.5.And can't reproduce the problem when run this code with v4 model,too.
Expected behavior
No error raise if I start the pre processing of my audio data before the onnxruntime warning log printing done
Environment
run pip install fastapi,jinja2,torch,numpy,librosa,onnxruntime,soundfile,pydub command in a python3.11 env will be fine
conda
,pip
, source):the problem have no torch involve,so I think you can install torch anyway you likeAdditional context
Conclusion: Turn out that the problem only happen during the onnxruntime printing the warning log,and v5 model spend seconds to print that log. If we invoke
onnxruntime.set_default_logger_severity(3)
to disable the warning logging ,or only use v4 model (which print much fewer warning log) ,can avoid the problem. It seems like some kind of incompatibilities betweenpydub
andonnxruntime
. But why?Why v5 model print more onnxruntime warning than v4, and what is the problem betweenpydub
andonnxruntime
. I have no Idea, maybe someone can help.The text was updated successfully, but these errors were encountered: