Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RUNTIME] Add min_repeat_ms to time_evaluator #2200

Merged
merged 3 commits into from
Jan 1, 2019

Conversation

merrymercy
Copy link
Member

min_repeat_ms sets the minimum duration of a measurement and has been used in autotvm for measurement.
As it is a useful feature to make measurement accurate and smart, we'd better move it to general API time_evaluator and encourage people to use it.

cc @eqy @tqchen @sgrechanik-h

@merrymercy merrymercy changed the title Add min_repeat_ms time_evaluator [RUNTIME] Add min_repeat_ms to time_evaluator Nov 30, 2018
@merrymercy merrymercy force-pushed the enhance_time_evaluator branch 2 times, most recently from 5f99d62 to f588537 Compare November 30, 2018 04:23
Copy link
Contributor

@sgrechanik-h sgrechanik-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also, is it possible to add some tests checking that this automatically adjusting time measurement algorithm works as expected?

@@ -139,26 +139,38 @@ def time_evaluator(self, func_name, ctx, number, repeat=1):
The context we should run this function on.

number: int
The number of steps used in measuring each time interval
The number of times to run this function for taking average.
We call this as one `repeat` of measurement.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not very clear from the description what we call a repeat of measurement. (The description for min_repeat_ms makes everything clearer though)

int number,
int repeat,
int min_repeat_ms) {
auto ftimer = [pf, ctx, &number, repeat, min_repeat_ms](TVMArgs args, TVMRetValue *rv) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the local variable number captured by reference here? It will escape the local scope, might be a bug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out tvm packed function does not support capturing reference. I updated to capture by value.


if (duration_ms < min_repeat_ms) {
number = static_cast<int>(std::max((min_repeat_ms / (duration_ms / number) + 1),
number * 1.618));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is 1.618?
Also, using ceil here might be better than adding 1.

Copy link
Contributor

@eqy eqy Nov 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, do we need this branch here if the loop will exit if the condition is met?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is 1.618?
Also, using ceil here might be better than adding 1.

https://en.wikipedia.org/wiki/Golden_ratio

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precision is not very important here as I want to encourage it to set a higher number.


duration_ms = std::chrono::duration_cast<std::chrono::duration<double> >
(tend - tbegin).count() * 100;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, here we rerun the whole process until we find the right number of iterations. An alternative would be to rerun only the number of iterations equal to the difference between the necessary number of iterations and the number of iterations already run. And then add its duration to the total duration. This approach may have a slightly different behavior, it may be a bit faster, but a bit less precise, I'm not sure, so I would like to see some more comments in the code describing the algorithm, and why this particular algorithm was chosen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgrechanik-h We cannot use the accumulation mode due to the reason explained by eqy

are not precise enough to capture short-running tasks. This parameter is
also critical when devices need a certain minimum running time to "warm
up," such as GPUs that need time to reach a performance power state.
where the first one is warm up and will be discarded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change this to "plus an additional warm up run that will be discarded." It currently sounds like it means (number - 1) x repeat

int number,
int repeat,
int min_repeat_ms) {
auto ftimer = [pf, ctx, &number, repeat, min_repeat_ms](TVMArgs args, TVMRetValue *rv) {
TVMRetValue temp;
std::ostringstream os;
// skip first time call, to activate lazy compilation components.
pf.CallPacked(args, &temp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this definition (1 + number * repeat) is the correct formulation after we have introduced min_repeat_ms. The goal is to start measurement in the correct power state, which we will likely do if we bump up number over and over again for the same time_evaluator call. However, let's say that number is now sufficient and we get to a fresh time_evaluator call. In this case I am not sure 1+ will be enough to get the hardware into the right state if necessary. Should we consider number*(1+repeat)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the definition is not correct. I will add a note to the doc string of min_repeat_ms but keep this definition here.


if (duration_ms < min_repeat_ms) {
number = static_cast<int>(std::max((min_repeat_ms / (duration_ms / number) + 1),
number * 1.618));
Copy link
Contributor

@eqy eqy Nov 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, do we need this branch here if the loop will exit if the condition is met?

@icemelon icemelon added the status: need update need update based on feedbacks label Dec 7, 2018
@@ -124,7 +124,8 @@ class RPCModuleNode final : public ModuleNode {
PackedFunc GetTimeEvaluator(const std::string& name,
TVMContext ctx,
int number,
int repeat) {
int repeat,
int min_repeat_ms) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this break some current tests if we do not give a default value for min_repeat_ms?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a default argument to python side

@tqchen
Copy link
Member

tqchen commented Dec 22, 2018

@merrymercy what is the status of this PR?

@apache apache deleted a comment from eqy Dec 26, 2018
@merrymercy merrymercy force-pushed the enhance_time_evaluator branch 3 times, most recently from 684edd4 to e58d6c3 Compare December 26, 2018 15:51
@merrymercy merrymercy force-pushed the enhance_time_evaluator branch from e58d6c3 to 3587c6b Compare December 26, 2018 15:52
TVMRetValue temp;
std::ostringstream os;
// skip first time call, to activate lazy compilation components.
pf.CallPacked(args, &temp);
DeviceAPI::Get(ctx)->StreamSync(ctx, nullptr);
int dynamic_number = number;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You used to modify number directly which had a nice property of remembering the suitable value of number between runs. I think you can still achieve this effect by declaring the lambda as mutable (won't be thread-safe though, so I'm not sure).


dynamic_number = static_cast<int>(
std::max((min_repeat_ms / (duration_ms / dynamic_number) + 1),
dynamic_number * 1.618));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice of the constant needs an explanation inside the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Halide uses 2 but I think there is no "correct" number, so it is a random number.

@merrymercy
Copy link
Member Author

@eqy please review again

@merrymercy merrymercy added status: need review and removed status: need update need update based on feedbacks labels Dec 28, 2018
@tqchen
Copy link
Member

tqchen commented Dec 31, 2018

ping @eqy @sgrechanik-h please take another look, if there is no further comments in 24 hours, we can go ahead and merge this PR in

@tqchen tqchen merged commit b118848 into apache:master Jan 1, 2019
@tqchen
Copy link
Member

tqchen commented Jan 1, 2019

Thanks, @merrymercy @eqy @sgrechanik-h , this is merged

@merrymercy merrymercy deleted the enhance_time_evaluator branch January 3, 2019 03:19
FrozenGene pushed a commit to FrozenGene/tvm that referenced this pull request Jan 10, 2019
@ZihengJiang ZihengJiang mentioned this pull request Feb 1, 2019
wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019
wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants