Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas for improvement #27

Open
16 tasks
ghost opened this issue Apr 2, 2013 · 0 comments
Open
16 tasks

Ideas for improvement #27

ghost opened this issue Apr 2, 2013 · 0 comments

Comments

@ghost
Copy link

ghost commented Apr 2, 2013

  • Benchmarks should register themselves with a name, so
    duplicate names can detected pandas-dev/pandas@977d581
  • [] The git log parsing is expensive, should take a commit range to speed things up
    (pandas' test_perf currently monkey-patches vbench to do this)
  • use sh to wrap the subprocess calls, as cleanup.
  • the repo parsing only includes commits on a given branch, should
    be less restrictive
  • using os.system and friends to invoke commands doesn't allow for
    redirecting output, so the runs are very noisy and callers can't do much about it.
    rework to use logging, or take in a stream as arg.
  • support arbitrary-length prefix style hash spec, just like git. (iow, resolve hashes via git)
  • expose the gc disable option as a documented interface, it's important.
  • use python-git or similar rather then system(), speed up repo parsing at the very least.
  • allow for custom build script (pandas has a build cahce system that can virtually eliminate build times)
  • vbench is very "hygenic" when recreating build environments, but the overhead per build (clone + build per commit) is excessive.
  • the rest of test_perf functionality (compare two commits, text reporting on the commandline, repeated measurements and summary stats, exporting results as dataframes, etc')
  • allow enforcement of an upper limit on vbench duration, to bound the suite runtime to something manageable.
  • pandas 5550, behavior change resulted in 600x change to perf. How to explicitly handle this sort of historical context?
  • often you want to identify changes to algo complexity rather then just overal abs difference (O(1) -> O(N)).
    If vbecnhes were parameterized by dataset size, that could be detected automatically. (For example 0.13rc1+ vs. 0.12 vbench, perf regressions. pandas-dev/pandas#5660 (comment), is it O(1) construction overhead?)
  • use experience shows that occasionaly, vbenches need to be modified after the fact (adjusting dataset size
    is common). This is not a problem for the "perf diff" use case, but for historical tracking it invalidates the
    timeseries. need to control for that and warn the user (immutable vbenches or perhaps hash the vbench ast
    to be less rigid re program text).
  • use sqlalchemy commit/transaction control to speed up the inital db creation (it shouldn't be that slow).
  • canned export of results into easy format (json) for consumption by other tools (jenkins plug-in for example, similar to coverage.xml and xunit xml)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants