Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add offsets to loads and stores #1474

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ltratt
Copy link
Contributor

@ltratt ltratt commented Nov 22, 2024

The main commit here is 04c879d which (hopefully) extensively documents the motivation and consequences of this PR. I do consider this PR an RFC of sorts, though I definitely come down on the side of "the pros outweigh the cons", though I'm open to arguments the other way!

Previously `load` and `store` both took SSA pointer variables directly.
This commit adds an offset relative to those variables. In other words,
we can now turn:

```
%7: ptr = ...
%8: ptr = ptr_add %7, 8
%9: i8 = load %8
```

into:

```
%7: ptr = ...
%9: i8 = load [%7 + 8]
```

This is effective on x64 (and should be on, at least, Arm too) because
"load from register + offset" can generally be encoded in a single
instruction.

We previously had a hack for the specific case above iff `ptr_add` was
immediately followed by `load/`store`. This commit generalises that:
the optimisation pass "inlines" `ptr_add` no matter how far back in the
trace it is. So this:

```
%7: ptr = ...
%8: ptr = ptr_add %7, 8
...
%5432: i8 = load %8
```

generates the same, efficient, code as if there was nothing between the
`ptr_add` and the `load`. We do the same optimisation for `store`s too.
This gives a 2.5% speed-up for big_loop.

There are, however, downsides:

  1. We complicate the JIT IR syntax with `[%val + off]`. Currently I
     haven't added JIT IR parser support for this, because it doesn't
     seem very useful, but you can observe it when traces are printed.
  2. We can no longer fit instructions in a single machine word: there
     isn't space in `StoreInst` (and only 7 bits free in `LoadInst`).
     TBH, I now think -- and this is entirely on my head! -- that
     we were going to have to break this constrain at some point soon,
     in order to allow bigger `InstIdx`s. Especially because we compile
     in a thread, I think the practical effect of this is low (I can
     still observe a speedup after all!), and if we had to go above 2
     machine words, I'm not sure it would make much difference.
  3. Philosophically speaking, we're extending the JIT IR simply to
     express an optimisation. Frankly speaking, I think the right
     long-term thing to do is -- probably -- to introduce another IR
     that has these details in, but I have no desire to introduce
     such a complication now.
  4. This doesn't help quite as much as one would hope because a
     surprising amount of `PtrAdd`s are only left around because they're
     needed by a guard. Optimising this is a job for later.

I have -- and obviously this isn't enforceable! -- hacked around the
consequences of ykjit#3 by only allowing the optimiser to set `off` to a
non-zero value. That means most of the pipeline can avoid thinking about
this, but we shouldn't kid ourselves that this means there is no
additional complexity from this change. There is, the complexity is
permanent --- though, in my judgement, it is probably worthwhile. I also
doubt this is the last change along these lines (e.g. should we try
handling `DynPtrAdd`).
@vext01
Copy link
Contributor

vext01 commented Nov 22, 2024

This all seems sensible to me.

If we are to allow instructions to be larger than a word, I wonder if we'd still strive to pack operands? I recall you saying that the indices are likely to be too small for long-term use, and if they were full-on usizes then we could get rid of a load of try_froms.

Note a typo in the PR title.

@ltratt ltratt changed the title Add offserts to loads and stores Add offsets to loads and stores Nov 22, 2024
@ltratt
Copy link
Contributor Author

ltratt commented Nov 22, 2024

If we are to allow instructions to be larger than a word, I wonder if we'd still strive to pack operands

An (accidental but nice) advantage of PackedOperand is that we de-copy it automatically when converting it to Operand i.e. we make it harder to misuse Operand. But, yes, I would probably like to make PackedOperand a u24 shortly.

Note a typo in the PR title.

Ooops, fixed!

@ptersilie
Copy link
Contributor

Looks good. Are there now other instructions that now could be simplified. DirectCall comes to mind, but I think due to the arbitrary arguments it's unlikely to fit into two words. But there may be others that now can?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants