Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accurate performance baseline #45

Merged
merged 4 commits into from
Jul 8, 2022
Merged

Conversation

countvajhula
Copy link
Collaborator

Summary of Changes

I've addressed your comments on #20 , @michaelballantyne , and avoided constructing lists in the benchmarks. This results in a seemingly murderous advantage for Racket over Qi, as you predicted. Although, I'm not really sure what these benchmarks are telling us. These differences only become apparent at all on scales much finer than practical workloads would seem to involve (and e.g. benchmarks in libraries using Qi are unaffected when using Qi vs Racket). So it might be like using bricks vs atoms to build a skyscraper, maybe - in practice it may not matter to the result, although using atoms might look nicer under a microscope. So I find myself wondering whether these benchmarks are actually useful in representing Qi vs Racket performance, and if not, what would be more useful / representative?

I'm also wondering what kinds of benchmarks would be useful as we undertake performance improvements once the compiler work is underway. Probably the forms benchmarks in profile/forms would be useful here, but those don't reflect non-local interactions.

At this point we probably just want to have some minimally accurate baseline against which future improvements / regressions could be seen with some confidence.

Would love to hear any thoughts you may have on this.

Public Domain Dedication

  • In contributing, I relinquish any copyright claims on my contribution and freely release it into the public domain in the simple hope that it will provide value.

@countvajhula countvajhula temporarily deployed to test-env June 30, 2022 22:24 Inactive
@michaelballantyne
Copy link
Collaborator

Although, I'm not really sure what these benchmarks are telling us. These differences only become apparent at all on scales much finer than practical workloads would seem to involve... so I find myself wondering whether these benchmarks are actually useful in representing Qi vs Racket performance, and if not, what would be more useful / representative?

I guess the question is whether there are cases you care about where Qi's performance is relevant to programs that use it. If yes, then micro-benchmarks like these that point out the specific overheads should be helpful as a starting point in thinking about compiler optimizations. If not... maybe there isn't actually a good reason to make Qi faster?

@countvajhula
Copy link
Collaborator Author

countvajhula commented Jul 2, 2022 via email

@countvajhula
Copy link
Collaborator Author

Discussed with @michaelballantyne and he said that these are in the right ballpark now, so, merging. To summarize his suggestions:

Qi could only exceed Racket performance if there are cases where it could have a stronger theory of optimization, where there are more "language equivalences" than Racket can employ in these cases. This could happen if either:

  1. There are cases where Racket compiler could do optimizations but, for whatever reason, currently doesn't (e.g. this may include cases where intermediate collections are constructed in functional pipelines where, Qi might avoid these via a compiler optimization. Such pipelines are much more common in Qi vs Racket, which might explain the relative prioritization of such optimizations in Racket)
  2. Qi introduced "undefined behavior" -- that is, eliminated invariants upheld by the Racket language in order to do additional optimizations (e.g. make no guarantees about side effects and mutation in esc forms -- and only providing guarantees for explicit use of effect which might entail mutation and side effects -- this seems reasonable for Qi).

In cases where Qi is slow in the local / general case, some options include:

  1. There may be nonlocal optimizations where instances of these expressions could end up equalling Racket performance by virtue of the rewrite rule at the higher level.
  2. Building a library of metadata about standard library functions, specifically argument arities, to be able to rewrite arbitrary-arity flows to known-arity function invocations where possible.

@countvajhula countvajhula merged commit a374516 into main Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants