Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The JIT may be generating no-op movs #49

Open
64kramsystem opened this issue Oct 12, 2021 · 10 comments
Open

The JIT may be generating no-op movs #49

64kramsystem opened this issue Oct 12, 2021 · 10 comments

Comments

@64kramsystem
Copy link
Collaborator

64kramsystem commented Oct 12, 2021

(I may be wrong on this, as I still don't have clear how register allocation works)

While trying some manual register allocation (specifically, directly popping a value into RDI (rt.pop_reg @fisk.rdi), so that it could be passed as first parameter of a cfunc (rt.call_cfunc_without_alignment <addr>, [@fisk.rdi, ...])), I was curious if TJ/FISK would avoid generating instructions for copying the operand to RDI.

Specifically, I was looking around here:

    def call_cfunc_without_alignment func_loc, params
      raise NotImplementedError, "too many parameters" if params.length > 6
      raise "No function location" unless func_loc > 0

      params.each_with_index do |param, i|
        case param
        when Integer
          @fisk.mov(Fisk::Registers::CALLER_SAVED[i], @fisk.uimm(param))
        when Fisk::Operand
          @fisk.mov(Fisk::Registers::CALLER_SAVED[i], param)
        when TemporaryVariable
          @fisk.mov(Fisk::Registers::CALLER_SAVED[i], param.to_register)
        else
          raise NotImplementedError
        end
      end
      @fisk.mov(@fisk.rax, @fisk.uimm(func_loc))
        .call(@fisk.rax)
      @fisk.rax
    end

Now, I couldn't actually find any mov rdi, rdi, however, I found that other no-op movs are generated (at least, I believe), specifically, a bunch of mov r9, r9.

I detect this by hacking fisk.rb -> write_to() with this (horrendous) code:

    while insn = instructions.shift
      z_insn_name = insn.instance_variable_get(:@insn).name rescue nil
      z_op0_reg_name = insn.instance_variable_get(:@operands)[0].register.name rescue nil
      z_op1_reg_name = insn.instance_variable_get(:@operands)[1].register.register.name rescue :phony # avoid nil == nil

      if z_insn_name == 'MOV' && z_op0_reg_name == z_op1_reg_name
        puts "#{z_insn_name} #{z_op0_reg_name}, #{z_op1_reg_name}"
      end

by running rake test you'll see many occurrences. Am I correct, or am I missing something?

I dug a little bit in Fisk, and the mov ultimately ends here:

in /path/to/fisk/instructions/mov.rb
   417:         Class.new(Fisk::Encoding) {
   418:           def encode buffer, operands
   419:             add_rex(buffer, operands,
   420:               true,
   421:               1,
   422:               operands[0].rex_value,
   423:               operands[1].rex_value,
   424:               operands[1].rex_value) +
   425:             add_opcode(buffer, 0x8B, 0) +
   426:             add_modrm_reg_mem(buffer,
   427:               0,
   428:               operands[0].op_value,
   429:               operands[1].op_value, operands) +
   430:             0
   431:           end

which seems to me, that it's encoding as mov r9, r9.

ps. On Error Resume Next FTW

@tenderlove
Copy link
Owner

Am I correct, or am I missing something?

I don't think you're missing anything! Do you have any ideas for fixing it? I'm not sure if it's something we should fix in Fisk (it does seem silly to make no-op mov instructions, but if someone really wants to do it we probably shouldn't stop them).

Just some brainstorming thoughts, we could:

  • Add a mode to Fisk that ignores no-op moves
  • Add a mode to Fisk that raises an exception, then we can fix it where it's happening
  • Something else?

It's cool that you found this!

@64kramsystem
Copy link
Collaborator Author

64kramsystem commented Oct 15, 2021

Just some brainstorming thoughts, we could:

* Add a mode to Fisk that ignores no-op moves

* Add a mode to Fisk that raises an exception, then we can fix it where it's happening

* Something else?

So!

In the past, I remember different assemblers (TASM vs. Idontrememberwhat) generating slightly different instructions, although I don't remember exactly how.

I've just tried with NASM a couple of "unoptimized" instructions - mov r9, r9 and mov rax, 0, in order to test if they were replaced with the "optimized" counterparts (nop and xor eax, eax (the latter extends zeroing to 64 bits)), and they weren't.

All in all, it seems that the role of assemblers is not to optimize (there is an optimize flag in NASM, but it effects only jumps) the code, rather, that it's the role of compilers.

Therefore, I'd go for this:

* Add a mode to Fisk that raises an exception, then we can fix it where it's happening

Essentially, this would be a linter that, for simplicity, generates errors instead of warnings. We'd then enable the linter from TJ either always, or, during the build only (say, a flag is passed to TJ, that in turn enables the linter in Fisk).

This way, Fisk keeps its traditional role of assembler, and with the flag enabled, it acquires some extra behavior.

Using the following strategy:

* Add a mode to Fisk that ignores no-op moves

Would IMO push Fisk a bit too much from what I suppose is the traditional role of assemblers.

(On the other hand, TJ has a separate two-tier architecture (JIT + assembler), and the lack of an intermediate representation makes impossible to write a generalized JIT and a specialized assembler; this is not a current concern though, and it may never be)

@tenderlove
Copy link
Owner

Essentially, this would be a linter that, for simplicity, generates errors instead of warnings. We'd then enable the linter from TJ either always, or, during the build only (say, a flag is passed to TJ, that in turn enables the linter in Fisk).

This way, Fisk keeps its traditional role of assembler, and with the flag enabled, it acquires some extra behavior.

Makes sense to me! I want to finish more of the send instruction and then I can work on this (unless you want to tackle it).

(On the other hand, TJ has a separate two-tier architecture (JIT + assembler), and the lack of an intermediate representation makes impossible to write a generalized JIT and a specialized assembler; this is not a current concern though, and it may never be)

I think adding an IR would be great, though I don't have any ideas on what would make a good IR. I have no experience designing or working with one, so if you have thoughts I'm all ears (maybe a good thing for the discussions section?)

@64kramsystem
Copy link
Collaborator Author

Makes sense to me! I want to finish more of the send instruction and then I can work on this (unless you want to tackle it).

I'd like to take this!! 😂 I'll open a tracking/discussion issue on Fisk before though, since it's very open ended (I'll open the issue also in case I won't be able to work on it for any reason) 😄

I think adding an IR would be great, though I don't have any ideas on what would make a good IR. I have no experience designing or working with one, so if you have thoughts I'm all ears (maybe a good thing for the discussions section?)

Yes! I'll open a discussion in the next week. But I have no idea, either 😂

@tenderlove
Copy link
Owner

Sounds good. I enabled discussions on the Fisk repo! I'm looking forward to the PR.

As for an IR, I was reading this and we probably want something either High-Level or Medium-Level. In the short term though, I'm going to work on making TJ run OptCarrot as well as compiling itself. Regardless, I'm looking forward to the IR discussion! 😄

@64kramsystem
Copy link
Collaborator Author

64kramsystem commented Oct 22, 2021

I suspect that duplication is caused by temp variables. If this is the case, the problem is that they are an abstraction at Fisk level (AFAIU), so TJ can't really do anything, and Fisk should be turned into an "optimizing" assembler. Nothing wrong with that (after all, temp variables are already something atypical (AFAIK) for an assembler), but it's certainly a design decision 😄

@tenderlove
Copy link
Owner

tenderlove commented Oct 23, 2021

I suspect that duplication is caused by temp variables.

Ah, right. We're using r9 and r10 as scratch registers, and r9 happens to be a register used for calling C functions. So we end up with a temp in r9 then move to r9.

Nothing wrong with that (after all, temp variables are already something atypical (AFAIK) for an assembler), but it's certainly a design decision

I'm fine with adding something like optimize or optimized_write(buff) to Fisk. I'd still like to keep the "dumb" mode as default, but since Fisk does keep instructions in a buffer before writing, it seems like a natural place to add the optimization.

Also we might think more about adding an IR because optimization passes on the IR could solve this too.

WDYT?

@64kramsystem
Copy link
Collaborator Author

I'm fine with adding something like optimize or optimized_write(buff) to Fisk

Considering that:

  • (I think) there is no use case for enabling both performance check and optimization
  • both check and optimization apply to the performance area/concept

What do you think of turning the current :check_performance flag into, say, :performance, with the two options :check and :optimize (other than nil)?

This would keep the APIs simpler. Intuitively, optimization should be applied (at least, with the current design) in any Fisk location where a performance check is applied, so I think inevitably, both checking and optimization are going to be ultimately branches of the same codepath - therefore, adding a new API would ultimately lead to the same codepath anyway.

@tenderlove
Copy link
Owner

What do you think of turning the current :check_performance flag into, say, :performance, with the two options :check and :optimize (other than nil)?

Ya, that makes sense. I'm keen on having the optimization pass being done at "write" time because I think we could probably add more interesting optimizations than just eliminating useless mov instructions. For example, I think the DSL for branch conditions emits useless stuff (an extra JMP maybe?) when one side of the branch is empty (like rt.if_eq(...) { do_stuff }.else { }), and we'd only be able to detect / eliminate that once we have the full buffer of instructions.

For TJ, I can't think of a use case where we care about :check. Wouldn't we just always want :optimize? (I don't have any strong feelings, just thinking out loud)

@64kramsystem
Copy link
Collaborator Author

Also we might think more about adding an IR because optimization passes on the IR could solve this too.

IR == 😍 😍 😍

I have two thoughts on IR for TJ/Fisk:

  1. the whole Ruby/TJ/Fisk stack would end up with, essentially, two bytecodes (and two corresponding optimizers). I take though, that there is no alternative to this, as TJ must necessarily be downstream of the Ruby VM
  2. I think that, as starting point, it's important to know what's the scope/purpose of the IR (optimization? of which abstraction(s)? multiplatform support?). The "wchih abstractions" part for example is very interesting - for example, should C function invocations be included (so the IR would conceptually be in between TJ and Fisk), or not (so the IR would just be entirely in the Fisk world)? In the former case, the hypothetical IR optimizer could for example take care of the stack alignment. But on the other hand, since the handlers use low-level assembler, they could step on the optimizer.

I think that, all in all, the Runtime could be a guide to how to develop the IR framework. This requires though, all the handlers to be fully moved to it.

The above is much guesswork on my part 😄 I don't know TJ/Fisk in detail.

I'll have a look at Rhizome actually, at a quick look, it seems that TJ/Fisk are based on it 🤩.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants