The JIT may be generating no-op `mov`s #49

64kramsystem · 2021-10-12T19:20:01Z

(I may be wrong on this, as I still don't have clear how register allocation works)

While trying some manual register allocation (specifically, directly popping a value into RDI (rt.pop_reg @fisk.rdi), so that it could be passed as first parameter of a cfunc (rt.call_cfunc_without_alignment <addr>, [@fisk.rdi, ...])), I was curious if TJ/FISK would avoid generating instructions for copying the operand to RDI.

Specifically, I was looking around here:

    def call_cfunc_without_alignment func_loc, params
      raise NotImplementedError, "too many parameters" if params.length > 6
      raise "No function location" unless func_loc > 0

      params.each_with_index do |param, i|
        case param
        when Integer
          @fisk.mov(Fisk::Registers::CALLER_SAVED[i], @fisk.uimm(param))
        when Fisk::Operand
          @fisk.mov(Fisk::Registers::CALLER_SAVED[i], param)
        when TemporaryVariable
          @fisk.mov(Fisk::Registers::CALLER_SAVED[i], param.to_register)
        else
          raise NotImplementedError
        end
      end
      @fisk.mov(@fisk.rax, @fisk.uimm(func_loc))
        .call(@fisk.rax)
      @fisk.rax
    end

Now, I couldn't actually find any mov rdi, rdi, however, I found that other no-op movs are generated (at least, I believe), specifically, a bunch of mov r9, r9.

I detect this by hacking fisk.rb -> write_to() with this (horrendous) code:

    while insn = instructions.shift
      z_insn_name = insn.instance_variable_get(:@insn).name rescue nil
      z_op0_reg_name = insn.instance_variable_get(:@operands)[0].register.name rescue nil
      z_op1_reg_name = insn.instance_variable_get(:@operands)[1].register.register.name rescue :phony # avoid nil == nil

      if z_insn_name == 'MOV' && z_op0_reg_name == z_op1_reg_name
        puts "#{z_insn_name} #{z_op0_reg_name}, #{z_op1_reg_name}"
      end

by running rake test you'll see many occurrences. Am I correct, or am I missing something?

I dug a little bit in Fisk, and the mov ultimately ends here:

in /path/to/fisk/instructions/mov.rb
   417:         Class.new(Fisk::Encoding) {
   418:           def encode buffer, operands
   419:             add_rex(buffer, operands,
   420:               true,
   421:               1,
   422:               operands[0].rex_value,
   423:               operands[1].rex_value,
   424:               operands[1].rex_value) +
   425:             add_opcode(buffer, 0x8B, 0) +
   426:             add_modrm_reg_mem(buffer,
   427:               0,
   428:               operands[0].op_value,
   429:               operands[1].op_value, operands) +
   430:             0
   431:           end

which seems to me, that it's encoding as mov r9, r9.

ps. On Error Resume Next FTW

The text was updated successfully, but these errors were encountered:

tenderlove · 2021-10-14T21:52:59Z

Am I correct, or am I missing something?

I don't think you're missing anything! Do you have any ideas for fixing it? I'm not sure if it's something we should fix in Fisk (it does seem silly to make no-op mov instructions, but if someone really wants to do it we probably shouldn't stop them).

Just some brainstorming thoughts, we could:

Add a mode to Fisk that ignores no-op moves
Add a mode to Fisk that raises an exception, then we can fix it where it's happening
Something else?

It's cool that you found this!

64kramsystem · 2021-10-15T21:00:42Z

Just some brainstorming thoughts, we could:

* Add a mode to Fisk that ignores no-op moves

* Add a mode to Fisk that raises an exception, then we can fix it where it's happening

* Something else?

So!

In the past, I remember different assemblers (TASM vs. Idontrememberwhat) generating slightly different instructions, although I don't remember exactly how.

I've just tried with NASM a couple of "unoptimized" instructions - mov r9, r9 and mov rax, 0, in order to test if they were replaced with the "optimized" counterparts (nop and xor eax, eax (the latter extends zeroing to 64 bits)), and they weren't.

All in all, it seems that the role of assemblers is not to optimize (there is an optimize flag in NASM, but it effects only jumps) the code, rather, that it's the role of compilers.

Therefore, I'd go for this:

* Add a mode to Fisk that raises an exception, then we can fix it where it's happening

Essentially, this would be a linter that, for simplicity, generates errors instead of warnings. We'd then enable the linter from TJ either always, or, during the build only (say, a flag is passed to TJ, that in turn enables the linter in Fisk).

This way, Fisk keeps its traditional role of assembler, and with the flag enabled, it acquires some extra behavior.

Using the following strategy:

* Add a mode to Fisk that ignores no-op moves

Would IMO push Fisk a bit too much from what I suppose is the traditional role of assemblers.

(On the other hand, TJ has a separate two-tier architecture (JIT + assembler), and the lack of an intermediate representation makes impossible to write a generalized JIT and a specialized assembler; this is not a current concern though, and it may never be)

tenderlove · 2021-10-15T21:36:59Z

Essentially, this would be a linter that, for simplicity, generates errors instead of warnings. We'd then enable the linter from TJ either always, or, during the build only (say, a flag is passed to TJ, that in turn enables the linter in Fisk).

This way, Fisk keeps its traditional role of assembler, and with the flag enabled, it acquires some extra behavior.

Makes sense to me! I want to finish more of the send instruction and then I can work on this (unless you want to tackle it).

(On the other hand, TJ has a separate two-tier architecture (JIT + assembler), and the lack of an intermediate representation makes impossible to write a generalized JIT and a specialized assembler; this is not a current concern though, and it may never be)

I think adding an IR would be great, though I don't have any ideas on what would make a good IR. I have no experience designing or working with one, so if you have thoughts I'm all ears (maybe a good thing for the discussions section?)

64kramsystem · 2021-10-16T06:22:03Z

Makes sense to me! I want to finish more of the send instruction and then I can work on this (unless you want to tackle it).

I'd like to take this!! 😂 I'll open a tracking/discussion issue on Fisk before though, since it's very open ended (I'll open the issue also in case I won't be able to work on it for any reason) 😄

I think adding an IR would be great, though I don't have any ideas on what would make a good IR. I have no experience designing or working with one, so if you have thoughts I'm all ears (maybe a good thing for the discussions section?)

Yes! I'll open a discussion in the next week. But I have no idea, either 😂

tenderlove · 2021-10-16T17:35:43Z

Sounds good. I enabled discussions on the Fisk repo! I'm looking forward to the PR.

As for an IR, I was reading this and we probably want something either High-Level or Medium-Level. In the short term though, I'm going to work on making TJ run OptCarrot as well as compiling itself. Regardless, I'm looking forward to the IR discussion! 😄

64kramsystem · 2021-10-22T23:05:07Z

I suspect that duplication is caused by temp variables. If this is the case, the problem is that they are an abstraction at Fisk level (AFAIU), so TJ can't really do anything, and Fisk should be turned into an "optimizing" assembler. Nothing wrong with that (after all, temp variables are already something atypical (AFAIK) for an assembler), but it's certainly a design decision 😄

tenderlove · 2021-10-23T00:33:16Z

I suspect that duplication is caused by temp variables.

Ah, right. We're using r9 and r10 as scratch registers, and r9 happens to be a register used for calling C functions. So we end up with a temp in r9 then move to r9.

Nothing wrong with that (after all, temp variables are already something atypical (AFAIK) for an assembler), but it's certainly a design decision

I'm fine with adding something like optimize or optimized_write(buff) to Fisk. I'd still like to keep the "dumb" mode as default, but since Fisk does keep instructions in a buffer before writing, it seems like a natural place to add the optimization.

Also we might think more about adding an IR because optimization passes on the IR could solve this too.

WDYT?

64kramsystem · 2021-10-24T22:05:55Z

I'm fine with adding something like optimize or optimized_write(buff) to Fisk

Considering that:

(I think) there is no use case for enabling both performance check and optimization
both check and optimization apply to the performance area/concept

What do you think of turning the current :check_performance flag into, say, :performance, with the two options :check and :optimize (other than nil)?

This would keep the APIs simpler. Intuitively, optimization should be applied (at least, with the current design) in any Fisk location where a performance check is applied, so I think inevitably, both checking and optimization are going to be ultimately branches of the same codepath - therefore, adding a new API would ultimately lead to the same codepath anyway.

tenderlove · 2021-10-24T22:33:19Z

What do you think of turning the current :check_performance flag into, say, :performance, with the two options :check and :optimize (other than nil)?

Ya, that makes sense. I'm keen on having the optimization pass being done at "write" time because I think we could probably add more interesting optimizations than just eliminating useless mov instructions. For example, I think the DSL for branch conditions emits useless stuff (an extra JMP maybe?) when one side of the branch is empty (like rt.if_eq(...) { do_stuff }.else { }), and we'd only be able to detect / eliminate that once we have the full buffer of instructions.

For TJ, I can't think of a use case where we care about :check. Wouldn't we just always want :optimize? (I don't have any strong feelings, just thinking out loud)

64kramsystem · 2021-10-24T22:42:33Z

Also we might think more about adding an IR because optimization passes on the IR could solve this too.

IR == 😍 😍 😍

I have two thoughts on IR for TJ/Fisk:

the whole Ruby/TJ/Fisk stack would end up with, essentially, two bytecodes (and two corresponding optimizers). I take though, that there is no alternative to this, as TJ must necessarily be downstream of the Ruby VM
I think that, as starting point, it's important to know what's the scope/purpose of the IR (optimization? of which abstraction(s)? multiplatform support?). The "wchih abstractions" part for example is very interesting - for example, should C function invocations be included (so the IR would conceptually be in between TJ and Fisk), or not (so the IR would just be entirely in the Fisk world)? In the former case, the hypothetical IR optimizer could for example take care of the stack alignment. But on the other hand, since the handlers use low-level assembler, they could step on the optimizer.

I think that, all in all, the Runtime could be a guide to how to develop the IR framework. This requires though, all the handlers to be fully moved to it.

The above is much guesswork on my part 😄 I don't know TJ/Fisk in detail.

I'll have a look at Rhizome actually, at a quick look, it seems that TJ/Fisk are based on it 🤩.

64kramsystem mentioned this issue Oct 25, 2021

Implement performance optimizer tenderlove/fisk#21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The JIT may be generating no-op `mov`s #49

The JIT may be generating no-op `mov`s #49

64kramsystem commented Oct 12, 2021 •

edited

Loading

tenderlove commented Oct 14, 2021

64kramsystem commented Oct 15, 2021 •

edited

Loading

tenderlove commented Oct 15, 2021

64kramsystem commented Oct 16, 2021

tenderlove commented Oct 16, 2021

64kramsystem commented Oct 22, 2021 •

edited

Loading

tenderlove commented Oct 23, 2021 •

edited

Loading

64kramsystem commented Oct 24, 2021

tenderlove commented Oct 24, 2021

64kramsystem commented Oct 24, 2021

The JIT may be generating no-op movs #49

The JIT may be generating no-op movs #49

Comments

64kramsystem commented Oct 12, 2021 • edited Loading

tenderlove commented Oct 14, 2021

64kramsystem commented Oct 15, 2021 • edited Loading

tenderlove commented Oct 15, 2021

64kramsystem commented Oct 16, 2021

tenderlove commented Oct 16, 2021

64kramsystem commented Oct 22, 2021 • edited Loading

tenderlove commented Oct 23, 2021 • edited Loading

64kramsystem commented Oct 24, 2021

tenderlove commented Oct 24, 2021

64kramsystem commented Oct 24, 2021

The JIT may be generating no-op `mov`s #49

The JIT may be generating no-op `mov`s #49

64kramsystem commented Oct 12, 2021 •

edited

Loading

64kramsystem commented Oct 15, 2021 •

edited

Loading

64kramsystem commented Oct 22, 2021 •

edited

Loading

tenderlove commented Oct 23, 2021 •

edited

Loading