Add offsets to loads and stores #1474

ltratt · 2024-11-22T14:04:32Z

The main commit here is 04c879d which (hopefully) extensively documents the motivation and consequences of this PR. I do consider this PR an RFC of sorts, though I definitely come down on the side of "the pros outweigh the cons", though I'm open to arguments the other way!

Previously `load` and `store` both took SSA pointer variables directly. This commit adds an offset relative to those variables. In other words, we can now turn: ``` %7: ptr = ... %8: ptr = ptr_add %7, 8 %9: i8 = load %8 ``` into: ``` %7: ptr = ... %9: i8 = load [%7 + 8] ``` This is effective on x64 (and should be on, at least, Arm too) because "load from register + offset" can generally be encoded in a single instruction. We previously had a hack for the specific case above iff `ptr_add` was immediately followed by `load/`store`. This commit generalises that: the optimisation pass "inlines" `ptr_add` no matter how far back in the trace it is. So this: ``` %7: ptr = ... %8: ptr = ptr_add %7, 8 ... %5432: i8 = load %8 ``` generates the same, efficient, code as if there was nothing between the `ptr_add` and the `load`. We do the same optimisation for `store`s too. This gives a 2.5% speed-up for big_loop. There are, however, downsides: 1. We complicate the JIT IR syntax with `[%val + off]`. Currently I haven't added JIT IR parser support for this, because it doesn't seem very useful, but you can observe it when traces are printed. 2. We can no longer fit instructions in a single machine word: there isn't space in `StoreInst` (and only 7 bits free in `LoadInst`). TBH, I now think -- and this is entirely on my head! -- that we were going to have to break this constrain at some point soon, in order to allow bigger `InstIdx`s. Especially because we compile in a thread, I think the practical effect of this is low (I can still observe a speedup after all!), and if we had to go above 2 machine words, I'm not sure it would make much difference. 3. Philosophically speaking, we're extending the JIT IR simply to express an optimisation. Frankly speaking, I think the right long-term thing to do is -- probably -- to introduce another IR that has these details in, but I have no desire to introduce such a complication now. 4. This doesn't help quite as much as one would hope because a surprising amount of `PtrAdd`s are only left around because they're needed by a guard. Optimising this is a job for later. I have -- and obviously this isn't enforceable! -- hacked around the consequences of ykjit#3 by only allowing the optimiser to set `off` to a non-zero value. That means most of the pipeline can avoid thinking about this, but we shouldn't kid ourselves that this means there is no additional complexity from this change. There is, the complexity is permanent --- though, in my judgement, it is probably worthwhile. I also doubt this is the last change along these lines (e.g. should we try handling `DynPtrAdd`).

vext01 · 2024-11-22T15:03:51Z

This all seems sensible to me.

If we are to allow instructions to be larger than a word, I wonder if we'd still strive to pack operands? I recall you saying that the indices are likely to be too small for long-term use, and if they were full-on usizes then we could get rid of a load of try_froms.

Note a typo in the PR title.

ltratt · 2024-11-22T15:48:39Z

If we are to allow instructions to be larger than a word, I wonder if we'd still strive to pack operands

An (accidental but nice) advantage of PackedOperand is that we de-copy it automatically when converting it to Operand i.e. we make it harder to misuse Operand. But, yes, I would probably like to make PackedOperand a u24 shortly.

Note a typo in the PR title.

Ooops, fixed!

ptersilie · 2024-11-22T16:43:04Z

Looks good. Are there now other instructions that now could be simplified. DirectCall comes to mind, but I think due to the arbitrary arguments it's unlikely to fit into two words. But there may be others that now can?

ltratt added 2 commits November 21, 2024 09:32

Remove redundant comment.

ed29db2

ltratt assigned ptersilie and vext01 Nov 22, 2024

ltratt changed the title ~~Add offserts to loads and stores~~ Add offsets to loads and stores Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add offsets to loads and stores #1474

Add offsets to loads and stores #1474

ltratt commented Nov 22, 2024

vext01 commented Nov 22, 2024

ltratt commented Nov 22, 2024

ptersilie commented Nov 22, 2024

Add offsets to loads and stores #1474

Are you sure you want to change the base?

Add offsets to loads and stores #1474

Conversation

ltratt commented Nov 22, 2024

vext01 commented Nov 22, 2024

ltratt commented Nov 22, 2024

ptersilie commented Nov 22, 2024