-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time series logging #96
Conversation
@inducer Is this going in the right direction (in particular examples/sod-mpi.py and mirgecom/steppers.py)? Edit:
|
mirgecom/steppers.py
Outdated
|
||
def __call__(self): | ||
from functools import partial | ||
_min = partial(self.discr.nodal_min, "vol") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure that there's no MPI reduction in this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see, there is none.
I fixed most of the comments that you had @inducer. For two things I will need some more guidance:
|
Commented inline. |
To cut down on noise in my email, I'm unsubscribing from this PR. When it next needs my attention, please @-mention me or hit the "request review" button. Otherwise, I may not see your messages in a timely manner. |
I think I addressed all your comments @inducer. Ready for another review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, getting close. These three, plus the doc CI failure, and then this is good to go IMO.
mirgecom/profiling.py
Outdated
@@ -253,7 +278,7 @@ def tabulate_profiling_data(self) -> pytools.Table: | |||
bandwidth_access_mean = "--" | |||
bandwidth_access_max = "--" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code could probably just use get_profiling_data_for_kernel
now, to avoid code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this would be too useful. get_profiling_data_for_kernel
only returns averages of some statistics, while tabulate_profiling_data
also calculates min/max as well as derived statistics such as bandwidth etc. Imho we wouldn't gain much by using get_profiling_data_for_kernel
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, that's kind of a strange inconsistency, too. Why can one gather max/min, the other not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can modify get_profiling_data_for_kernel
to also return (and log) min/max, if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about returning something like this:
class StatisticsAccumulator:
def __init__(self):
self.num_values = 0
self._sum = 0
self._min = 0
self._max = 0
def add_value(self, v):
if v is None:
return
self.num_values += 1
self._sum += v
# ...
def mean(self):
if self.num_values == 0:
return None
return self.sum / self.num_values
# ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a side note, I think my original idea for just doing averages here was to reduce the sheer number of statistics and the associated clutter (+overhead). Just adding min/max would triple the amount of kernel profiling data, which is already by far the majority of data stored for each time step. Also, it would create "nice"-looking entries like multiply_time_min.max
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I'm not saying we have to create LogQuantity
s for all of them. This accumulator thing is just so that we can unify the summation logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the solution you outlined. This works a bit differently for the derived quantities
mirgecom/logging_quantities.py
Outdated
logmgr.set_constant("cl_device_name", str(queue.device)) | ||
|
||
|
||
def logmgr_add_default_discretization_quantities(logmgr: LogManager, discr, dim, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def logmgr_add_default_discretization_quantities(logmgr: LogManager, discr, dim, | |
def logmgr_add_all_discretization_quantities(logmgr: LogManager, discr, dim, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The quantities added here aren't necessarily all discretization-related quantities, so renaming this is maybe confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. add_many
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be, or add_basic
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't like add_basic
so much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed to add_many
.
Ready for another review @inducer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super close. Just a few tweaks to make StatisticsAccumulator
general purpose.
mirgecom/profiling.py
Outdated
self._min = sys.maxsize | ||
self._max = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to start with None
, to avoid assuming bounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look done. Forgot to push?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, sorry. Should be pushed now.
mirgecom/profiling.py
Outdated
"""Class that provides statistical functions for profile results. | ||
|
||
Values are converted to "Giga". | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add
.. automethod
for methods. - Document
num_values
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look done. Forgot to push?
mirgecom/profiling.py
Outdated
@@ -51,15 +52,68 @@ class SingleCallKernelProfile: | |||
footprint_bytes: int | |||
|
|||
|
|||
class StatisticsAccumulator: | |||
"""Class that provides statistical functions for profile results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No real reason to me to make this special-purpose. Move to mirgecom.tools
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to mirgecom.utils
(and pushed).
mirgecom/profiling.py
Outdated
class StatisticsAccumulator: | ||
"""Class that provides statistical functions for profile results. | ||
|
||
Values are converted to "Giga". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not introduce a scale_factor
constructor parameter (defaulting to 1) so as to not wreck the general-purpose-ness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look done. Forgot to push?
Should be ready for another review @inducer. |
Implements https://github.com/illinois-ceesd/planning/issues/57