somehow organize performance benchmarking of git-annex #3

yarikoptic · 2020-10-02T13:57:25Z

We added some checks to git-annex build workflow to spot some cases which could lead to slow(er) standalone build operation, but overall we do not have a good way to detect whenever git-annex "slow downs". We can only see reflection of that whenever we try a new snapshot build sweeping through our datalad tests but then it becomes an archeological expedition to see which change brought the pessimization.

It would be nice to establish automated and consistent benchmarking of git-annex builds as pertinent to datalad.

Proposal:

take some release of datalad still compatible with current annex build (so we could take current release ATM IIRC)
use asv benchmarks of that datalad but for benchmarking git-annex (so whenever we improve our datalad benchmarks collection, it automagically helps to benchmark git-annex)
establish datalad/git-annex-benchmarking on github
- git subtree benchmarks from datalad
- Include git-annex's master branch (from git://git.kitenet.net/git-annex) as annex-master branch known to that repo
  - I think asv can benchmark commits in another branch, while benchmarks would be in the master. So asv configuration would do that
- add pythonish setup to make standalone install the git-annex so asv could deploy any given version of git-annex ( I wonder if there is smth like ccache for haskell ;))
- I think we better right away establish a singularity container based on e.g. https://github.com/datalad-tester/dev-containers/blob/master/Singularity.10.20200209.1 which would add apt build-dep git-annex-standalone, and that would be container to run asv in -- it would have all build-dependencies etc, and we use this for "worker" env (see below) later
- add github action to run on cron, which would
- git checkout annex-master && git pull --ff-only && git push origin annex-master && git checkout master
- asv run on new commits in annex-master and then asv gh-pages && datalad save -m "ASV results update" .asv && git push origin
provide github actions worker on a dedicated box (I have some, consistent timing), probably within singularity (at least some isolation and again -- consistency)
- make that github action to run only only on pushes, not PRs so we do not anyhow compromise security

WDYT @mih @kyleam @bpoldrack @jwodder

FYI @joeyh

The text was updated successfully, but these errors were encountered:

joeyh · 2020-10-02T17:59:04Z

There is git-annex benchmark, which does a good job of benchmarking a git-annex command or sequence of commands you choose. It can output to json or csv, which lets benchmarks be compared and a regression be flagged. At least in theory.. I don't have anything doing that. Output of git-annex benchmark whereis --csv foo.csv Name,Mean,MeanLB,MeanUB,Stddev,StddevLB,StddevUB whereis,5.076051109441738e-2,4.914089405704959e-2,5.4101610224266704e-2,4.234978773428508e-3,2.050397769413241e-3,6.8122220186021265e-3 (But does not include startup speed in the benchmark currently. Could add an option to include that, or maybe better a mode that only benchmarks the startup speed.)

…

-- see shy jo

yarikoptic transferred this issue from datalad/datalad-extensions Nov 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

somehow organize performance benchmarking of git-annex #3

somehow organize performance benchmarking of git-annex #3

yarikoptic commented Oct 2, 2020

joeyh commented Oct 2, 2020 via email

somehow organize performance benchmarking of git-annex #3

somehow organize performance benchmarking of git-annex #3

Comments

yarikoptic commented Oct 2, 2020

joeyh commented Oct 2, 2020 via email