Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add basic profiling to benchmarks #116

Merged
merged 11 commits into from
Sep 5, 2023
Merged

feat: add basic profiling to benchmarks #116

merged 11 commits into from
Sep 5, 2023

Conversation

ielashi
Copy link
Contributor

@ielashi ielashi commented Aug 23, 2023

Problem

The benchmarks currently return the number of instructions it took to execute each benchmark. While this number is useful to measure performance, it doesn't provide insight into where these instructions are being used and where the performance bottle necks are. Without this information, making informed performance optimizations would require a lot of trial and error.

Solution

The typical solution to this problem is to use some kind of profiler. ic-repl already supports profiling and can output a flamegraph of where instructions are being spent, but it has a few drawbacks that makes it difficult to use:

  1. The names of rust methods are mangled, even when debug = 1 is turned on, making it hard to make sense of the output.
  2. Each benchmark includes logic to first setup, and only after setup would we want to profile, so we'd need a way to programmatically tell the profiler to reset its measurements.
  3. Often we'd like to benchmark blocks of code that aren't functions.

To address the issues above, this commit introduces a "poor man profiler". This profiler is manual, in the sense that the developer adds to the code hints for what they care about profiling. In this PR, I added some basic hints, and the benchmarks now return an output that looks like this:

Benchmarking btreemap_insert_blob_64_1024_v2: Warming up for 1.0000 ms
2023-08-23 07:26:53.560585 UTC: [Canister rwlgt-iiaaa-aaaaa-aaaaa-cai] {
    "node_load_v2": "5_182_358_668 (80%)",
    "node_save_v2": "786_197_957 (12%)",
}

Benchmarking btreemap_insert_blob_64_1024_v2: Collecting 10 samples in estimated 345.63 s (165 iterations
btreemap_insert_blob_64_1024_v2
                        time:   [6474.1 M Instructions 6474.1 M Instructions 6474.1 M Instructions]
                        change: [+0.0000% +0.0000% +0.0000%] (p = NaN > 0.05)
                        No change in performance detected.

This approach is simple and effective, but it does have the draw back that it makes the instructions count slightly inaccurate, as the profiling logic itself consumes cycles. I think we can limit this inaccuracy by making the profiler crate internally account for its own overhead and deducting those from its measurements.

@ielashi
Copy link
Contributor Author

ielashi commented Aug 23, 2023

Note to reviewers, this is a reincarnation of #52

profiler/src/lib.rs Outdated Show resolved Hide resolved
profiler/src/lib.rs Show resolved Hide resolved
Base automatically changed from ielashi/btree_v2 to main September 5, 2023 08:39
@ielashi ielashi enabled auto-merge (squash) September 5, 2023 08:52
@ielashi ielashi merged commit 6d00eca into main Sep 5, 2023
@ielashi ielashi deleted the ielashi/profiler branch September 5, 2023 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants