Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add performance tracking machinery #1509

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

serramatutu
Copy link
Contributor

@serramatutu serramatutu commented Nov 5, 2024

Summary

This PR introduces some new testing functionality that we can use to avoid performance regressions. From a user perspective, the performance benchmarking workflow is:

  1. Add a new function you want to start tracking to the list in performance_helpers.py
  2. Create a new test in tests_metricflow/performance/ that does something we want to track performance for. You can use the following fixtures:
  • measure_compilation_performance: measure the compilation of some SQL based on a query spec and a manifest.
  • perf_tracker: a "lower level" fixture that you can usd to create measurements and sessions (more on that later)
  • ... : we can create more performance-tracking fixtures as needed, all using the perf_tracker base thing.
  1. Run make perf to generate a performance-report.json
  2. Run make perf-compare A=base-report.json B=other-report.json to generate a Markdown file with a report on changes.

The performance report

The performance-report.json file contains the "raw" data for a given run. It looks like this:

{
    "sessions": {
        "test_simple_manifest.py::test_simple_query": {
            "session_id": "test_simple_manifest.py::test_simple_query",
            "contexts": {
                "DataflowPlanBuilder.build_plan": {
                    "context_id": "DataflowPlanBuilder.build_plan",
                    "cpu_ns_average": 5089000,
                    "cpu_ns_median": 5089000,
                    "cpu_ns_max": 5089000,
                    "wall_ns_average": 5204000,
                    "wall_ns_median": 5204000,
                    "wall_ns_max": 5204000
                },
                "global": {
                    "context_id": "global",
                    "cpu_ns_average": 10156000,
                    "cpu_ns_median": 10156000,
                    "cpu_ns_max": 10156000,
                    "wall_ns_average": 10343000,
                    "wall_ns_median": 10343000,
                    "wall_ns_max": 10343000
                }
            }
        },
        ...
    }
 }

Each "report" contains a list of "sessions". A session is a run where multiple measurements have been taken. Each session contains a variety of "contexts", which are aggregations (avg, mean, max) of CPU and wall time measurements based on a context_id such as DataflowPlanBuilder.build_plan. We can add memory tracking later using psutil.

Report comparison

I created a script called compare_reports.py which takes in two different reports files and calculates the difference between them (in absolute values and also percent). It then generates a markdown file called performance-comparison.md with a nice little table like the one on my first comment.

How it works

When you run make perf, the perf_tracker fixture uses the track_performance context manager, which

  • Instantiates a PerformanceTracker
  • Monkeypatches all class methods in _tracking_class_methods and all its occurrences if it's already been imported by other methods. I used unittest.mock.patch for this.
  • Yields the performance tracker

The tests then all run, and at the end of it all the fixture compiles all reports by calling get_report_set(). This will generate the ContextReports, SessionReports and SessionReportSet based on the raw list of Calls to the tracked methods.

After this, it writes the report to the JSON file.

Then, when you run make perf-compare the script loads both the JSON files and uses the compare() methods in each of the report classes. These will generate Comparison classes with the percent and absolute differences. The script then renders these classes to markdown and writes it to a file.

Next steps

Make a Github Actions bot that runs this on PRs to compare performance against main.

This commit adds the basic functionality needed for measuring
performance. This includes:
- classes to measure performance in different sessions using
context managers
- a context manager that can be used to instrument certain methods
with performance tracking machinery
- test fixtures for automatically generating reports using these
classes
This commit adds `Comparison` classes which contain comparisons between
performance measurements taken at two different points in time. It uses
these classes in a new script that can be called to generate a markdown
report of what changed regarding performance.
Copy link

github-actions bot commented Nov 5, 2024

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@serramatutu
Copy link
Contributor Author

Performance comparison

Comparing performance-report.json against performance-report-2.json

Worst performance hit: -3.93% in test_simple_manifest.py::test_simple_query

test_simple_manifest.py::test_simple_query_2 🚀

context CPU avg CPU median CPU max Wall avg Wall median Wall max
DataflowPlanBuilder.build_plan -0.0318ms (-7.27%) -0.0318ms (-7.27%) -0.0318ms (-7.27%) -0.0383ms (-8.76%) -0.0383ms (-8.76%) -0.0383ms (-8.76%)
global -0.0583ms (-5.97%) -0.0583ms (-5.97%) -0.0583ms (-5.97%) -0.0642ms (-6.56%) -0.0642ms (-6.56%) -0.0642ms (-6.56%)

test_simple_manifest.py::test_simple_query 🚀

context CPU avg CPU median CPU max Wall avg Wall median Wall max
DataflowPlanBuilder.build_plan -0.0377ms (-7.41%) -0.0377ms (-7.41%) -0.0377ms (-7.41%) -0.0281ms (-5.40%) -0.0281ms (-5.40%) -0.0281ms (-5.40%)
global -0.0572ms (-5.63%) -0.0572ms (-5.63%) -0.0572ms (-5.63%) -0.0406ms (-3.93%) -0.0406ms (-3.93%) -0.0406ms (-3.93%)

start_wall = time.time_ns()
start_cpu = time.process_time_ns()

yield self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level question - what's the advantage of doing this ourself instead of using the output of cProfile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I just didn't really know how cProfile worked and this looked easier. Happy to change how PerformanceTracker works to use cProfile under the hood. We can reuse the thing that compares reports though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants