Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Graph Runtime] Run_individual for benchmarking individual layers #2569

Merged
merged 1 commit into from
Feb 27, 2019

Conversation

hlu1
Copy link
Contributor

@hlu1 hlu1 commented Feb 6, 2019

I find this method very helpful for identifying bottleneck layers when I'm optimizing end-to-end model performance, especially when python is not available.

@tqchen, @ajtulloch, please review

@sergei-mironov
Copy link
Contributor

Looks like an important tool. Could you please also add a test or example demonstrating how it may be used?

* \param warmup Number of warmup runs.
* \param iter Number of iteration runs.
*/
void RunIndividual(int warmup, int iter) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add min_repeat_ms as an argument to this function? Because latency of each op could vary a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the latency of each op could vary a lot. Is min_repeat_ms for the whole model or each op? Do you find adding this parameter helpful to get consistent benchmark result? What's the typical number you use?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean min_repeat_ms for each op. Yes, I think it'll make the benchmark result more consistent. And it's also consistent with other time evaluator API, such as
https://github.com/dmlc/tvm/blob/master/python/tvm/module.py#L130

@@ -119,6 +152,14 @@ PackedFunc GraphRuntimeDebug::GetFunction(
this->DebugGetNodeOutput(args[0], args[1]);
}
});
} else if (name == "run_individual") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expose this function to python API and add a test case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do

}
std::vector<double> time_per_op(op_execs().size(), 0);
for (int k = 0; k < iter; k++) {
for (size_t i = 0; i < op_execs().size(); ++i) {
Copy link
Member

@zhiics zhiics Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we only measure operators and exclude input and weight nodes? It looks that there is no need to loop on op_execs()[i] == nullptr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right

@tqchen
Copy link
Member

tqchen commented Feb 12, 2019

@hlu1 can you please fix the lint error and address the review comments? @icemelon9 please followup this PR

@tqchen tqchen added the status: need update need update based on feedbacks label Feb 12, 2019
@hlu1 hlu1 force-pushed the run_individual branch 2 times, most recently from dcb7be0 to f720954 Compare February 19, 2019 05:07
@hlu1
Copy link
Contributor Author

hlu1 commented Feb 19, 2019

@icemelon9, sorry about the delay. I changed the interface to be the same as the tvm time evaluator API. The implementation is pretty close to the time evaluator here https://github.com/dmlc/tvm/blob/master/src/runtime/rpc/rpc_session.cc#L1194. I also added python API and python tests.

@hlu1 hlu1 force-pushed the run_individual branch 3 times, most recently from 5d21a24 to 45a573e Compare February 20, 2019 06:33
@tqchen
Copy link
Member

tqchen commented Feb 23, 2019

@icemelon9 can you moderate this PR?

@tqchen
Copy link
Member

tqchen commented Feb 26, 2019

Copy link
Member

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except for some nits.

* \param number The number of times to run this function for taking average.
* \param repeat The number of times to repeat the measurement.
In total, the function will be invoked (1 + number x repeat) times,
where the first one is warm up and will be discarded in case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
where the first one is warm up and will be discarded in case
where the first one is warmed up and will be discarded in case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* \param repeat The number of times to repeat the measurement.
In total, the function will be invoked (1 + number x repeat) times,
where the first one is warm up and will be discarded in case
there is lazy initialization..
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
there is lazy initialization..
there is lazy initialization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@tqchen
Copy link
Member

tqchen commented Feb 26, 2019

@hlu1 please update per review comments, and let us get it in

Copy link
Member

@icemelon icemelon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@tqchen tqchen merged commit 2a89818 into apache:master Feb 27, 2019
@tqchen
Copy link
Member

tqchen commented Feb 27, 2019

Thanks, @hlu1 @icemelon9 @zhiics, this is now merged

@tqchen tqchen added status: accepted and removed status: need update need update based on feedbacks labels Feb 27, 2019
@yzhliu yzhliu mentioned this pull request Mar 2, 2019
28 tasks
@hlu1 hlu1 deleted the run_individual branch April 17, 2019 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants