Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post processing instance segmentation taking a lot of time #14

Open
lwillems191 opened this issue May 27, 2024 · 8 comments
Open

Post processing instance segmentation taking a lot of time #14

lwillems191 opened this issue May 27, 2024 · 8 comments

Comments

@lwillems191
Copy link

Hello,

When using instance segmentation the post processing is taking quite a lot of time and I was wondering if there might be way to optimize it. I found which line is taking the most time, but have not found a way to optimize it.
Maybe somebody else has a good idea.

var value = Enumerable.Range(0, output.Channels).Sum(i => tensor1[0, i, y, x] * maskWeights[i]);

@aloksharma1
Copy link

aloksharma1 commented May 31, 2024

can you try this? (untested but changing to a for loop would surely improve it)

float value = 0;
for (int i = 0; i < output.Channels; ++i)
{
    value += tensor1[0, i, y, x] * maskWeights[i];
}

@NickSwardh
Copy link
Owner

Currently working on improving the overall performance in this branch https://github.com/NickSwardh/YoloDotNet/tree/performance where I've replaced the usage a legacy ONNX-class to use OrtValue Api instead for improved performance along with a few other tweaks here and there.

@aloksharma1
Copy link

Currently working on improving the overall performance in this branch https://github.com/NickSwardh/YoloDotNet/tree/performance where I've replaced the usage a legacy ONNX-class to use OrtValue Api instead for improved performance along with a few other tweaks here and there.

is this branch prod ready (on nuget)?

@lwillems191
Copy link
Author

Currently working on improving the overall performance in this branch https://github.com/NickSwardh/YoloDotNet/tree/performance where I've replaced the usage a legacy ONNX-class to use OrtValue Api instead for improved performance along with a few other tweaks here and there.

Yeah this branch already gives a great improvement. Thanks for the work you put into it.

@NickSwardh
Copy link
Owner

Awesome! Thank you for letting me know :)

@NickSwardh
Copy link
Owner

Currently working on improving the overall performance in this branch https://github.com/NickSwardh/YoloDotNet/tree/performance where I've replaced the usage a legacy ONNX-class to use OrtValue Api instead for improved performance along with a few other tweaks here and there.

is this branch prod ready (on nuget)?

No, not yet, It's a work in progress. I'm still turning the nuts and bolts to see if I can squeeze some more speed out of this thing ;)

@louislewis2
Copy link
Contributor

Hi @NickSwardh

First off, thanks for the great library you have created here!

Inspired by this issue and facing some performance issues myself, I forked your branch and initially added some benchmarks to ensure that code changes for perf can be validated. Once the benchmarks were in place, I was able to spot some quick wins that at least in my testing has dramatically improved the overall performance. I also added a few other useful benchmarks to start understanding where time is spent and memory is allocated. The reduced GC pressure has increased my overall throughput in my application due to there now being less GC induced pauses.

The benchmarks that require it, also run both Gpu and Cpu variations, so that one can spot improvements or degradations over both at the same time.

I have created a PR if you are interested, I apologize upfront for the size of it.
Some refactoring seemed fitting to make provision for sharing of resources like the assets etc..

#16

@lwillems191
Copy link
Author

I did some more testing. The new improvements make the code already a lot faster, but from my testing it seems it might be better to not use Parellel.For loops. They seem to be a lot less consistent then a normal for loop. Also the speed improvement does not seem that much. Hopefully you will take this into consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants