-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run serverless endpoint batch test and record cost and time results #99
Comments
additional meeting notes: Testing the impact of automation rules and addition of mira models during test would show multiple trips and writes to db impact cost green light from mongodb to get dedicated instance for mongodb atlas. should improve performance of DB. preference is to reduce cost over inference time. kinesis firehose or dynamo db would be more time performant DBs we agreed to run without atlas next week, and assess if we need to run the test again with atlas later |
With letterboxing and the fully reproduced yolov5, we get average inference times of 9 seconds per image on sagemaker serverless, which only supports CPU.
Back when we ran the above test last year, we were testing on fixed resizing to 640x640 with a torchscript model compiled for the CPU, inference time was closer to 2.5 seconds per image: https://docs.google.com/spreadsheets/d/17t-zgKwWdVSArf7mgu4QJXOvtGVIlcUYTnwYEpNZQsU/edit#gid=0 We'll be exploring how to reduce inference time while preserving reproduced accuracy: #106 |
@rbavery deployed the ONNX MDv5 (PR here) to a Sagemaker Serverless endpoint and it looks like per-image inference is around 3.5-4 seconds. The entire processing time for a test batch of 10,168 images was 11hrs, 8 mins (3.9 seconds per image). So 1000 images takes roughly an hour to process, 100k would take 4.5 days. Not bad for now! We'll explore speeding this up perhaps down the road by taking advantage of concurrent processing (having two separate Serverless endpoints for Megadetector - one for real-time inference needs and one for batch, and ditching the FIFO queues for standard SQS queues). There are also endpoint and model level optimizations we could explore as well (#112). |
User story
We need to understand the cost of running the current architecture inference on large archives (25 Gb) of imagery. In terms of both time (does it take a week with retries? 2 days?) and in terms of costs for the serverless MDV5 endpoint that auto-scales with requests. For this first run, we won't include the Mira endpoints in this test.
we will run this test on duplicated images that matches ratio of animals/no animals. ~ 60% are empty. All are jpegs.
secondarily, we'd like to understand:
Things we need to run the test:
Resolution Criteria
For 25 Gb of random imagery , where sample images will be close to 1280x1280 (Natty will pick a representative range), how long does autoscale inference take for mdv5?
What was the cost per image? Did this vary throughout the job due to retries?
Were there any failures not resolved by retries?
The text was updated successfully, but these errors were encountered: