[Graph Runtime] Run_individual for benchmarking individual layers #2569

hlu1 · 2019-02-06T00:52:26Z

I find this method very helpful for identifying bottleneck layers when I'm optimizing end-to-end model performance, especially when python is not available.

@tqchen, @ajtulloch, please review

sergei-mironov · 2019-02-06T10:02:36Z

Looks like an important tool. Could you please also add a test or example demonstrating how it may be used?

icemelon · 2019-02-06T17:53:55Z

src/runtime/graph/debug/graph_runtime_debug.cc

+   * \param warmup Number of warmup runs.
+   * \param iter Number of iteration runs.
+   */
+  void RunIndividual(int warmup, int iter) {


Could you also add min_repeat_ms as an argument to this function? Because latency of each op could vary a lot.

Yes, the latency of each op could vary a lot. Is min_repeat_ms for the whole model or each op? Do you find adding this parameter helpful to get consistent benchmark result? What's the typical number you use?

I mean min_repeat_ms for each op. Yes, I think it'll make the benchmark result more consistent. And it's also consistent with other time evaluator API, such as
https://github.com/dmlc/tvm/blob/master/python/tvm/module.py#L130

icemelon · 2019-02-06T18:21:49Z

src/runtime/graph/debug/graph_runtime_debug.cc

@@ -119,6 +152,14 @@ PackedFunc GraphRuntimeDebug::GetFunction(
          this->DebugGetNodeOutput(args[0], args[1]);
        }
      });
+  } else if (name == "run_individual") {


Could you expose this function to python API and add a test case?

Yes, will do

zhiics · 2019-02-06T18:45:19Z

src/runtime/graph/debug/graph_runtime_debug.cc

+    }
+    std::vector<double> time_per_op(op_execs().size(), 0);
+    for (int k = 0; k < iter; k++) {
+      for (size_t i = 0; i < op_execs().size(); ++i) {


Should we only measure operators and exclude input and weight nodes? It looks that there is no need to loop on op_execs()[i] == nullptr.

You're right

tqchen · 2019-02-12T18:51:33Z

@hlu1 can you please fix the lint error and address the review comments? @icemelon9 please followup this PR

hlu1 · 2019-02-19T05:11:18Z

@icemelon9, sorry about the delay. I changed the interface to be the same as the tvm time evaluator API. The implementation is pretty close to the time evaluator here https://github.com/dmlc/tvm/blob/master/src/runtime/rpc/rpc_session.cc#L1194. I also added python API and python tests.

tqchen · 2019-02-23T22:56:24Z

@icemelon9 can you moderate this PR?

tqchen · 2019-02-26T02:39:54Z

@zhiics @icemelon9 please https://docs.tvm.ai/contribute/code_review.html#approve-and-request-changes-explicitly

zhiics

LGTM, except for some nits.

zhiics · 2019-02-26T03:49:31Z

src/runtime/graph/debug/graph_runtime_debug.cc

+   * \param number The number of times to run this function for taking average.
+   * \param repeat The number of times to repeat the measurement.
+            In total, the function will be invoked (1 + number x repeat) times,
+            where the first one is warm up and will be discarded in case


Suggested change

where the first one is warm up and will be discarded in case

where the first one is warmed up and will be discarded in case

zhiics · 2019-02-26T03:49:42Z

src/runtime/graph/debug/graph_runtime_debug.cc

+   * \param repeat The number of times to repeat the measurement.
+            In total, the function will be invoked (1 + number x repeat) times,
+            where the first one is warm up and will be discarded in case
+            there is lazy initialization..


Suggested change

there is lazy initialization..

there is lazy initialization.

tqchen · 2019-02-26T21:09:10Z

@hlu1 please update per review comments, and let us get it in

icemelon

lgtm

tqchen · 2019-02-27T02:32:45Z

Thanks, @hlu1 @icemelon9 @zhiics, this is now merged

…ache#2569)

icemelon requested changes Feb 6, 2019

View reviewed changes

zhiics reviewed Feb 6, 2019

View reviewed changes

tqchen assigned icemelon Feb 12, 2019

tqchen added the status: need update need update based on feedbacks label Feb 12, 2019

hlu1 force-pushed the run_individual branch 2 times, most recently from dcb7be0 to f720954 Compare February 19, 2019 05:07

hlu1 force-pushed the run_individual branch 3 times, most recently from 5d21a24 to 45a573e Compare February 20, 2019 06:33

zhiics reviewed Feb 26, 2019

View reviewed changes

icemelon approved these changes Feb 26, 2019

View reviewed changes

hlu1 force-pushed the run_individual branch from 45a573e to 4a844d2 Compare February 26, 2019 22:39

[Graph Runtime] Run_individual for benchmarking individual layers

1d4b6d0

hlu1 force-pushed the run_individual branch from 4a844d2 to 1d4b6d0 Compare February 26, 2019 22:41

zhiics approved these changes Feb 26, 2019

View reviewed changes

tqchen merged commit 2a89818 into apache:master Feb 27, 2019

tqchen added status: accepted and removed status: need update need update based on feedbacks labels Feb 27, 2019

yzhliu mentioned this pull request Mar 2, 2019

[DEV] TVM v0.6 Roadmap #2623

Closed

28 tasks

wweic pushed a commit to neo-ai/tvm that referenced this pull request Mar 9, 2019

[Graph Runtime] Run_individual for benchmarking individual layers (ap…

4092611

…ache#2569)

wweic pushed a commit to neo-ai/tvm that referenced this pull request Mar 12, 2019

[Graph Runtime] Run_individual for benchmarking individual layers (ap…

a39f27a

…ache#2569)

wweic pushed a commit to neo-ai/tvm that referenced this pull request Mar 12, 2019

[Graph Runtime] Run_individual for benchmarking individual layers (ap…

8235f5f

…ache#2569)

hlu1 deleted the run_individual branch April 17, 2019 06:52

hlu1 mentioned this pull request May 23, 2019

[GraphRuntime] Layerwise timing in debug graph runtime #3232

Merged

tqchen unassigned icemelon Nov 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Graph Runtime] Run_individual for benchmarking individual layers #2569

[Graph Runtime] Run_individual for benchmarking individual layers #2569

hlu1 commented Feb 6, 2019

sergei-mironov commented Feb 6, 2019

icemelon Feb 6, 2019

hlu1 Feb 7, 2019

icemelon Feb 7, 2019

icemelon Feb 6, 2019

hlu1 Feb 7, 2019

zhiics Feb 6, 2019 •

edited

Loading

hlu1 Feb 7, 2019

tqchen commented Feb 12, 2019

hlu1 commented Feb 19, 2019

tqchen commented Feb 23, 2019

tqchen commented Feb 26, 2019 •

edited

Loading

zhiics left a comment

zhiics Feb 26, 2019

hlu1 Feb 26, 2019

zhiics Feb 26, 2019

hlu1 Feb 26, 2019

tqchen commented Feb 26, 2019

icemelon left a comment

tqchen commented Feb 27, 2019

	where the first one is warm up and will be discarded in case
	where the first one is warmed up and will be discarded in case

[Graph Runtime] Run_individual for benchmarking individual layers #2569

[Graph Runtime] Run_individual for benchmarking individual layers #2569

Conversation

hlu1 commented Feb 6, 2019

sergei-mironov commented Feb 6, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiics Feb 6, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Feb 12, 2019

hlu1 commented Feb 19, 2019

tqchen commented Feb 23, 2019

tqchen commented Feb 26, 2019 • edited Loading

zhiics left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Feb 26, 2019

icemelon left a comment

Choose a reason for hiding this comment

tqchen commented Feb 27, 2019

zhiics Feb 6, 2019 •

edited

Loading

tqchen commented Feb 26, 2019 •

edited

Loading