Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time series logging #96

Merged
merged 227 commits into from
Feb 25, 2021
Merged

Time series logging #96

merged 227 commits into from
Feb 25, 2021

Conversation

matthiasdiener
Copy link
Member

@matthiasdiener matthiasdiener commented Sep 11, 2020

@matthiasdiener matthiasdiener self-assigned this Sep 11, 2020
@matthiasdiener
Copy link
Member Author

Here is how the output looks like (x-axis is simtime, y-axis is walltime per step):

image

@matthiasdiener matthiasdiener changed the title initial logging initial time series logging Sep 15, 2020
@matthiasdiener
Copy link
Member Author

matthiasdiener commented Sep 30, 2020

@inducer Is this going in the right direction (in particular examples/sod-mpi.py and mirgecom/steppers.py)?

Edit:
I'm thinking of doing the following:

  • Putting the LogQuantity classes into their own file, mirgecom/logging.py or so.
  • Adding the startup code to the mpi_entry_point decorator.

mirgecom/steppers.py Outdated Show resolved Hide resolved

def __call__(self):
from functools import partial
_min = partial(self.discr.nodal_min, "vol")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure that there's no MPI reduction in this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, there is none.

mirgecom/steppers.py Outdated Show resolved Hide resolved
mirgecom/steppers.py Outdated Show resolved Hide resolved
@matthiasdiener matthiasdiener changed the title initial time series logging Time series logging Oct 5, 2020
@matthiasdiener
Copy link
Member Author

I fixed most of the comments that you had @inducer. For two things I will need some more guidance:

  • the delete issue
  • the kernel results traversal

@inducer
Copy link
Contributor

inducer commented Feb 22, 2021

For two things I will need some more guidance:

Commented inline.

@inducer
Copy link
Contributor

inducer commented Feb 23, 2021

To cut down on noise in my email, I'm unsubscribing from this PR. When it next needs my attention, please @-mention me or hit the "request review" button. Otherwise, I may not see your messages in a timely manner.

@matthiasdiener
Copy link
Member Author

I think I addressed all your comments @inducer. Ready for another review.

Copy link
Contributor

@inducer inducer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, getting close. These three, plus the doc CI failure, and then this is good to go IMO.

mirgecom/profiling.py Outdated Show resolved Hide resolved
@@ -253,7 +278,7 @@ def tabulate_profiling_data(self) -> pytools.Table:
bandwidth_access_mean = "--"
bandwidth_access_max = "--"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code could probably just use get_profiling_data_for_kernel now, to avoid code duplication.

Copy link
Member Author

@matthiasdiener matthiasdiener Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this would be too useful. get_profiling_data_for_kernel only returns averages of some statistics, while tabulate_profiling_data also calculates min/max as well as derived statistics such as bandwidth etc. Imho we wouldn't gain much by using get_profiling_data_for_kernel here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, that's kind of a strange inconsistency, too. Why can one gather max/min, the other not?

Copy link
Member Author

@matthiasdiener matthiasdiener Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can modify get_profiling_data_for_kernel to also return (and log) min/max, if you prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about returning something like this:

class StatisticsAccumulator:
    def __init__(self):
        self.num_values = 0
        self._sum = 0
        self._min = 0
        self._max = 0

    def add_value(self, v):
        if v is None:
             return
        self.num_values += 1
        self._sum += v
        # ...

    def mean(self):
        if self.num_values == 0:
            return None

        return self.sum / self.num_values
# ...

Copy link
Member Author

@matthiasdiener matthiasdiener Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a side note, I think my original idea for just doing averages here was to reduce the sheer number of statistics and the associated clutter (+overhead). Just adding min/max would triple the amount of kernel profiling data, which is already by far the majority of data stored for each time step. Also, it would create "nice"-looking entries like multiply_time_min.max.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I'm not saying we have to create LogQuantitys for all of them. This accumulator thing is just so that we can unify the summation logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the solution you outlined. This works a bit differently for the derived quantities

mirgecom/profiling.py Outdated Show resolved Hide resolved
logmgr.set_constant("cl_device_name", str(queue.device))


def logmgr_add_default_discretization_quantities(logmgr: LogManager, discr, dim,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def logmgr_add_default_discretization_quantities(logmgr: LogManager, discr, dim,
def logmgr_add_all_discretization_quantities(logmgr: LogManager, discr, dim,

Copy link
Member Author

@matthiasdiener matthiasdiener Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quantities added here aren't necessarily all discretization-related quantities, so renaming this is maybe confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. add_many?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be, or add_basic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't like add_basic so much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to add_many.

@matthiasdiener
Copy link
Member Author

Ready for another review @inducer.

Copy link
Contributor

@inducer inducer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super close. Just a few tweaks to make StatisticsAccumulator general purpose.

Comment on lines 65 to 66
self._min = sys.maxsize
self._max = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to start with None, to avoid assuming bounds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look done. Forgot to push?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, sorry. Should be pushed now.

"""Class that provides statistical functions for profile results.

Values are converted to "Giga".
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add .. automethod for methods.
  • Document num_values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look done. Forgot to push?

@@ -51,15 +52,68 @@ class SingleCallKernelProfile:
footprint_bytes: int


class StatisticsAccumulator:
"""Class that provides statistical functions for profile results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real reason to me to make this special-purpose. Move to mirgecom.tools?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to mirgecom.utils (and pushed).

class StatisticsAccumulator:
"""Class that provides statistical functions for profile results.

Values are converted to "Giga".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not introduce a scale_factor constructor parameter (defaulting to 1) so as to not wreck the general-purpose-ness?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look done. Forgot to push?

@matthiasdiener
Copy link
Member Author

Should be ready for another review @inducer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Logging makes warnings re: units
3 participants