-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Graph Runtime] Run_individual for benchmarking individual layers #2569
Conversation
Looks like an important tool. Could you please also add a test or example demonstrating how it may be used? |
* \param warmup Number of warmup runs. | ||
* \param iter Number of iteration runs. | ||
*/ | ||
void RunIndividual(int warmup, int iter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add min_repeat_ms as an argument to this function? Because latency of each op could vary a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the latency of each op could vary a lot. Is min_repeat_ms
for the whole model or each op? Do you find adding this parameter helpful to get consistent benchmark result? What's the typical number you use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean min_repeat_ms
for each op. Yes, I think it'll make the benchmark result more consistent. And it's also consistent with other time evaluator API, such as
https://github.com/dmlc/tvm/blob/master/python/tvm/module.py#L130
@@ -119,6 +152,14 @@ PackedFunc GraphRuntimeDebug::GetFunction( | |||
this->DebugGetNodeOutput(args[0], args[1]); | |||
} | |||
}); | |||
} else if (name == "run_individual") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expose this function to python API and add a test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will do
} | ||
std::vector<double> time_per_op(op_execs().size(), 0); | ||
for (int k = 0; k < iter; k++) { | ||
for (size_t i = 0; i < op_execs().size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we only measure operators and exclude input and weight nodes? It looks that there is no need to loop on op_execs()[i] == nullptr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right
@hlu1 can you please fix the lint error and address the review comments? @icemelon9 please followup this PR |
dcb7be0
to
f720954
Compare
@icemelon9, sorry about the delay. I changed the interface to be the same as the tvm time evaluator API. The implementation is pretty close to the time evaluator here https://github.com/dmlc/tvm/blob/master/src/runtime/rpc/rpc_session.cc#L1194. I also added python API and python tests. |
5d21a24
to
45a573e
Compare
@icemelon9 can you moderate this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except for some nits.
* \param number The number of times to run this function for taking average. | ||
* \param repeat The number of times to repeat the measurement. | ||
In total, the function will be invoked (1 + number x repeat) times, | ||
where the first one is warm up and will be discarded in case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where the first one is warm up and will be discarded in case | |
where the first one is warmed up and will be discarded in case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* \param repeat The number of times to repeat the measurement. | ||
In total, the function will be invoked (1 + number x repeat) times, | ||
where the first one is warm up and will be discarded in case | ||
there is lazy initialization.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is lazy initialization.. | |
there is lazy initialization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@hlu1 please update per review comments, and let us get it in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I find this method very helpful for identifying bottleneck layers when I'm optimizing end-to-end model performance, especially when python is not available.
@tqchen, @ajtulloch, please review