Tracking profiling run results #686

matt-graham · 2022-08-09T14:06:20Z

We would like to be able to track how the the timings measured in profiling runs of the src/scripts/profiling/scale_run.py script changes as new pull-requests are merged in. This would help identifying when PRs lead to performance regressions and allow us to be more proactive in fixing performance bottlenecks.

Ideally this should be as automated using GitHub Actions workflows. Triggering the workflow on pushes to master would give the most detail in terms of giving a direct measurement of the performance differences arising from a particular PR, but when lots of PRs are going in could potentially create a large backlog of profiling runs, so an alternative would be to run on a schedule (for example nightly) using the cron event. It would probably be worth also allowing triggering either using the workflow_dispatch event or using the comment-triggered workflow functionality to allow manually triggering in PRs that it is thought might have a significant effect on performance before merging.

Key questions to be resolved are what profiling outputs we want to track (for example at what level of granularity, using which profiling tool) and how we want to visualize the outputs. One option would be to save the profiler output as a workflow artifact. While this would be useful in allowing access to the raw profiling data, the only option for accessing workflow artifacts appears to be downloading the artifact as a compressed zip file so this is not necessarily itself that useful for visualizing the output. One option for visualizing the profiling results would be to use the GitHub Actions job summary which allows using Markdown to produce customized output showed on the job summary page. Another option would be to output the profiling results to HTML files and then deploy these to either a GitHub Pages site or potentially to a static site on Azure storage.

Potentially useful links

The airspeed velocity package allows tracking the results of benchmarks of Python packages overtime and visualizing the results as plots in a web interface. While focused on suites of benchmarks it does also have support for running single benchmarks with profiling.

htmlpreview allows directly previewing HTML files in a GitHub repository as GitHub forces them to use the "text/plain" content-type, so they cannot be interpreted

The text was updated successfully, but these errors were encountered:

willGraham01 · 2023-06-26T08:58:02Z

The developer onboarding says that we currently use pyinstrument to benchmark the scale_run script, so I thought I'd make a quick few comparisons against ASV:

ASV

(Claims to) integrate quite well with git:
- Can run regressions over multiple commits
- Has a find function function similar to git bisect to identify where the "biggest" slowdown in a period occurred. Could be useful for resolving situations where the cron job flags a slowdown at the end of a day, but the day includes multiple PR merges
- Results can be served directly to github pages as HTML.
- That being said, it creates a lot of artefacts in the git repo that need to be added to the .gitignore.
Prefers to use a conda environment to run benchmarks which integrates well with our current developer practices
Is designed to only run on one machine, as it saves JSON and database records of the benchmarks at each commit it is told to look at.
- We'd have to dedicate a fixed machine if we wanted to reap the benefits
- We'd also need to implement some kind of manual "clean-up" of results from the past to save being overwhelmed
Rather worryingly, it seems that asv is not actively maintained nor up-to-date
- Latest stable Python version it mentions is 3.6, not great when we're trying to update our dependencies.
- There is an issue here that essentially mirrors the concerns that we would have. (Credit to Sofia)

pyinstrument

No particular git integration, we'd have to keep track of which profiling results corresponded to which commits manually (although this doesn't seem too difficult)
- Outputs can be exported as html, so we could still manually deploy the results
Relies on a Python executable being on the path, so we'd need to setup a conda environment each time we ran the job and make sure the correct Python version was being found
- The Python API allows us to load previous results in too, so we can load in results and conduct further analysis if the initial output isn't sufficient
The above means it can happily run on any machine, so GH runners or any Azure machine that's available at the time would suffice
Actively maintained

The maintain-ability (?) issue jumps out as something of a red flag to me, but asv otherwise looks to have slighly better features at the cost of needing a dedicated machine. pyinstrument seems more flexible however; it's fairly easy to write a psuedocode GH action workflow using it right away:

- Checkout repository

- Setup conda
- Setup conda envrionment from developer/user docs
- Install pyinstrument into the evironment

- Run pyinstrument producing a HTML output (and maybe a session output so we can reload later)

- Push HTML file somewhere? Maybe to a separate branch that we an manually view the files with htmlpreview?

willGraham01 · 2023-06-29T10:33:50Z

A couple of options (more details in this file)

ASV-based workflow/job, on a dedicated machine. Use asv run --profile so we collect both benchmarking and profiling outputs, putting them somewhere. The profiling results won't be rendered in HTML or in a human-readable format, we'll need another tool for this.
Workflow/job invoking pyinstrument, in a similar vein to this example. We can manually extract something like the cpu_time to use as a rough benchmarking estimate; relying on the Azure machines to be of reasonably similar spec, and can retain the profiling HTML files and publish these ourselves somewhere. Benchmarking won't be as accurate but provides the profiling information in a much more usable way, and doesn't require a dedicated machine.
Workflow/job running both ASV & pyinstrument. Give us the best of both, but it's a heavy compute cost and still requires a dedicated machine. Also, we'd have to investigate how the two HTML deployments play together.

The wgraham/asv-benchmark and wgraham/pyinstrument-profiling-ci branches have (locally working, still need to fix the broken tests!) implementations of both ASV and pyinstrument for the tasks above (on a 1month long simulation so the results get produced in ~2mins).

Opinions welcome: the github-pages branch of this repository is un-used so we can initially send the HTML outputs to there for viewing.

matt-graham · 2023-07-04T16:28:21Z

Some notes from meeting of @tamuri, @willGraham01 and myself today to discuss this issue

A possible simple system for storing and tracking profiling results is to push to either a dedicated branch within the main TLOmodel repository or in a separate repository, using simple nested directory structure for organizing results, similar to that created by the Julia BenchmarkCI.jl package (see example output for ParticleDA.jl repository).
- On balance we favoured using a separate repository, for now still under UCL organization named TLOmodel-outputs / TLOmodel-profiling or similar, as this would avoid downside of dedicated branch approach of possibly adding to issues around already large size of repository, and be inline with longer term aims of creating a TLOmodel organization and splitting up existing repository.
For each profiling run we should capture
- Raw profiler output (for example pyisession file for pyinstrument)
- Key parameters of run (for example initial population size, simulation length, Git commit ID)
- Some form of HTML / Markdown summary of profiling output for easy viewing on repository
- Potentially additional statistics such as memory usage of Python process, population dataframe size, main and HSI event queue sizes, tracked across run using a new logger event
For now we will just have the profiling outputs stored on the repository and use GitHub web interface as simple way for viewing summaries, while also allowing repository to be cloned locally to perform more detailed analysis
- Any Python scripts for analyzing profiling outputs across commits / time can be kept in new repository. At some point we can potentially move to automating running these with workflows within the repository.

willGraham01 · 2023-07-10T08:36:01Z

tamuri · 2023-12-22T10:20:26Z

At some point, we can move the profiling repo into the TLOmodel org (https://github.com/TLOmodel).

matt-graham · 2024-04-08T14:17:48Z

Closing this as profiling workflow now capturing statistics and working reliably

matt-graham added the enhancement New feature or request label Aug 9, 2022

matt-graham self-assigned this Aug 9, 2022

matt-graham changed the title ~~Tracking benchmark run results~~ Tracking profiling run results Jun 22, 2023

matt-graham assigned willGraham01 and unassigned matt-graham Jun 22, 2023

willGraham01 mentioned this issue Jun 28, 2023

Benchmark scripts with pyinstrument #1010

Closed

willGraham01 mentioned this issue Jun 29, 2023

Profiling workflow on workflow_dispatch #1012

Merged

willGraham01 mentioned this issue Jul 10, 2023

Add Python scripts for analysis UCL/TLOmodel-profiling#1

Closed

tamuri added framework performance labels Jul 26, 2023

willGraham01 mentioned this issue Sep 4, 2023

[BUG] update_call assigns work to officers with no time available #1089

Closed

willGraham01 mentioned this issue Sep 20, 2023

Capture profiling statistics #1110

Merged

matt-graham mentioned this issue Apr 8, 2024

Move repositories under TLOmodel organization #1311

Open

matt-graham closed this as completed Apr 8, 2024

github-project-automation bot added this to Issue management Aug 27, 2024

github-project-automation bot moved this to Done in Issue management Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking profiling run results #686

Tracking profiling run results #686

matt-graham commented Aug 9, 2022 •

edited

Loading

willGraham01 commented Jun 26, 2023 •

edited

Loading

willGraham01 commented Jun 29, 2023 •

edited

Loading

matt-graham commented Jul 4, 2023

willGraham01 commented Jul 10, 2023 •

edited

Loading

tamuri commented Dec 22, 2023

matt-graham commented Apr 8, 2024

Tracking profiling run results #686

Tracking profiling run results #686

Comments

matt-graham commented Aug 9, 2022 • edited Loading

Potentially useful links

willGraham01 commented Jun 26, 2023 • edited Loading

ASV

pyinstrument

willGraham01 commented Jun 29, 2023 • edited Loading

matt-graham commented Jul 4, 2023

willGraham01 commented Jul 10, 2023 • edited Loading

Statistics to potentially capture:

File sizes of the pyisession outputs

tamuri commented Dec 22, 2023

matt-graham commented Apr 8, 2024

matt-graham commented Aug 9, 2022 •

edited

Loading

willGraham01 commented Jun 26, 2023 •

edited

Loading

willGraham01 commented Jun 29, 2023 •

edited

Loading

willGraham01 commented Jul 10, 2023 •

edited

Loading