Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future optimizations #112

Open
nathanielrindlaub opened this issue Apr 7, 2023 · 3 comments
Open

Future optimizations #112

nathanielrindlaub opened this issue Apr 7, 2023 · 3 comments

Comments

@nathanielrindlaub
Copy link
Member

nathanielrindlaub commented Apr 7, 2023

@rbavery had a few ideas for future optimizations of the Megadetector v5 endpoint that I wanted to document:

  1. test compiling the model with NeuralMagic for increased inference speed
  2. explore using test-time augmentations (during inference perform a few different random transforms/pre-processing steps on the fly, request inference on all versions of the image, and then average results across them) to boost model accuracy. This would come at the cost of potentially tripling (or more) our inference time depending on how many augmentations we try and under what conditions, so we'd want to think through it a bit more and be sure the benefits out weigh costs.
  3. use ONNX-compiled models across all endpoints for the sake of standardization (and perhaps some speed gains)

@rbavery - anything else to add here??

@nathanielrindlaub
Copy link
Member Author

From Dan:

Sometimes, if we're still missing animals, but one or both models look close, try again using YOLOv5's test-time augmentation tools via this alternative (but compatible) MD inference script.

@nathanielrindlaub
Copy link
Member Author

Also according to Dan, "inference takes ~1.7x longer with TTA turned on". That's not as bad a hit as I was imagining, so very much worth evaluating.

@rbavery
Copy link
Contributor

rbavery commented Nov 10, 2023

Just a heads up that I got try running MDV5a compile dwith tensorrt and it was blazing fast. Example here: https://github.com/rbavery/animal_detector/blob/master/mdv5app/torchscript_to_tensorrt.py

It sped up inference something like ~10x on my GPU compared to running the torchscript model without tensorrt.

This might be the most cost effective option for bulk inference without requiring a change in architecture. still uses torchserve and virtually the same handler code paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants