Enable batch mode benchmarking on iree-android-benchmark #5106

hanhanW · 2021-03-15T10:27:42Z

This produces more stable numbers because it can amortize overhead in batch mode.

Due to issues on Pixel 4 GPU, we can't let it run more than two seconds. Otherwise it will get killed. Thus, the batch mode is not enabled for Pixel 4 GPU.

Pixel 4

MobileBert - CPU: ~660 ms
MobileBert - GPU: ~952 ms (unchanged)
MobileNetV2 - CPU: ~380 ms
MobileNetV2 - GPU: ~1024 ms (unchanged)

S20

MobileBert - CPU: ~750 ms
MobileBert - GPU: ~270 ms
MobileNetV2 - CPU: ~266 ms
MobileNetV2 - GPU: ~55 ms

hanhanW · 2021-03-15T16:22:50Z

This depends on #5082

antiagainst

LGTM. Just one question.

antiagainst · 2021-03-15T20:39:34Z

build_tools/mako/configuration.py



+# The batch numbers are roughly computed to let it benchmark more than 3
+# seconds.
+# Do not set batch size on Pixel 4 for GPU targets, because it will get killed


We'll see fluctuation for Pixel 4 then. Is there a way to disable mako upload failure for Pixel 4 GPU?

Do you mean to disable the regression analyzer but still uploading the numbers? Or you mean to disable the whole benchmarking for Pixel4 GPU?

I can make the former one work by tweaking the uploader program, this should not be hard. The latter one is super easy, removing the target is enough.

GMNGeoffrey · 2021-03-15T21:48:32Z

build_tools/mako/configuration.py

@@ -47,6 +47,11 @@ def get_driver(self) -> str:
    """ Returns a string indicates the driver of the target."""
    return self.name.split("-")[0]

+  def add_batch_flag(self, size):
+    self.compilation_flags.append(
+        f"--iree-hal-benchmark-dispatch-repeat-count={size}")


Hmmm so this seems like kind of a hacky way to make dispatches kernels dominate the benchmarking time. Like we're no longer benchmarking mobile bert or whatever model, we're benchmarking specifically the dispatches of that model, which is a different thing. I'm not saying that isn't a good thing to benchmark, but it's a different thing to benchmark for sure. That means this is no longer recording "how fast is IREE at running model X". Are we setting up new benchmark reports (or whatever mako calls them) to record this new metric?

Yes, this is to make dispatches kernels dominate the benchmarking time. I think this is what we are looking for at this stage.

I feel this is also what we are measuring with tflite benchmark tools. I dig into the implementation and found that it excludes initialized session and only measures generated_code::Inference. What is excluded in IREE is something similar, like vm stack allocation, for example. There are more but I can't tell now.

Initialized session in 609.945ms. Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds. count=64 first=13039 curr=8455 min=6052 max=13039 avg=7632.81 std=1138

Ideally we would like to let benchmark to report something like ex.subtmit_and_wait, or iree_hal_semaphore_wait_with_deadline. E.g., #4686 (comment)

We can also have other metrics to include overhead or initial times. But I think we are looking for real execution time at this stage. And this is an easy way to get it.

I totally agree that we sometimes want end-to-end time including overheads, initialization stage, etc.

When I worked on accelerators, I was suffered from performance measurement with or without runtime overheads. The numbers from simulator were awesome, but the actual end-to-end execution time was worse in the beginning. Then we broke the performance down by something like Tracy and identified issues. There was also something only causes overhead in the first run. E.g., you will have to allocate some resources in the beginning. In the second run and later runs, they are eliminated. Case by case, sometimes initialization time happens in system level which is must be there, not count to application performance. Sometimes you will run it for many times and you don't care for the first run.

And yes again, this is different from what we've measured. Let's re-define the focus of benchmarking are for dispatches?

Are we setting up new benchmark reports (or whatever mako calls them) to record this new metric?

The answer is no. I think in IREE's scope, we can focus on dispatches kernels performance. This also matches to how the competitor (like tflite) does. We can revisit to benchmark what we had before when there is a need.

Yeah so my main issue here is that we're basically changing the definition of a metric while we're measuring it. Like the performance graph now shows a "performance improvement":

but that's really just a change to what we're measuring. It also seems a bit disingenuous to call this a "mobile bert" benchmark (or whatever model). This doesn't represent how fast IREE could run a mobile bert inference under even ideal conditions.

I'm not disagreeing that it may be the most useful metric for us right now.

(side note: we already do have initial runs that are not measured and run the whole model multiple times and amortize over that)

google-cla bot added the cla: yes label Mar 15, 2021

hanhanW marked this pull request as draft March 15, 2021 10:28

hanhanW changed the title ~~Batch bench~~ Enable batch mode benchmarking on iree-android-benchmark Mar 15, 2021

hanhanW force-pushed the batch-bench branch 2 times, most recently from 3815d56 to 93daa9d Compare March 15, 2021 15:38

hanhanW changed the base branch from main to hanhan-pr-5082 March 15, 2021 16:12

hanhanW marked this pull request as ready for review March 15, 2021 16:12

hanhanW requested review from antiagainst and GMNGeoffrey March 15, 2021 16:23

antiagainst approved these changes Mar 15, 2021

View reviewed changes

GMNGeoffrey reviewed Mar 15, 2021

View reviewed changes

hanhanW added 4 commits March 16, 2021 03:56

Enable batch mode benchmarking.

787f40c

add comments and set numbers

0a9d906

default argument should not be mutable.

a4b338b

fix typo

ac6c936

hanhanW force-pushed the batch-bench branch from 69e2ffc to ac6c936 Compare March 16, 2021 10:56

hanhanW changed the base branch from hanhan-pr-5082 to main March 16, 2021 10:56

hanhanW requested a review from GMNGeoffrey March 16, 2021 16:13

hanhanW merged commit 2eededf into iree-org:main Mar 17, 2021

hanhanW deleted the batch-bench branch March 17, 2021 04:18

not-jenni mentioned this pull request Mar 17, 2021

Merge main -> google #5138

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable batch mode benchmarking on iree-android-benchmark #5106

Enable batch mode benchmarking on iree-android-benchmark #5106

hanhanW commented Mar 15, 2021 •

edited

Loading

hanhanW commented Mar 15, 2021

antiagainst left a comment

antiagainst Mar 15, 2021

hanhanW Mar 16, 2021

GMNGeoffrey Mar 15, 2021

hanhanW Mar 16, 2021

hanhanW Mar 16, 2021

GMNGeoffrey Mar 17, 2021

Enable batch mode benchmarking on iree-android-benchmark #5106

Enable batch mode benchmarking on iree-android-benchmark #5106

Conversation

hanhanW commented Mar 15, 2021 • edited Loading

Pixel 4

S20

hanhanW commented Mar 15, 2021

antiagainst left a comment

Choose a reason for hiding this comment

antiagainst Mar 15, 2021

Choose a reason for hiding this comment

hanhanW Mar 16, 2021

Choose a reason for hiding this comment

GMNGeoffrey Mar 15, 2021

Choose a reason for hiding this comment

hanhanW Mar 16, 2021

Choose a reason for hiding this comment

hanhanW Mar 16, 2021

Choose a reason for hiding this comment

GMNGeoffrey Mar 17, 2021

Choose a reason for hiding this comment

hanhanW commented Mar 15, 2021 •

edited

Loading