Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug optimization #47410

Closed
aristarh2704 opened this issue Jan 13, 2018 · 9 comments
Closed

Bug optimization #47410

aristarh2704 opened this issue Jan 13, 2018 · 9 comments

Comments

@aristarh2704
Copy link

aristarh2704 commented Jan 13, 2018

I write this code:

pub fn clean(x:isize){
unsafe{
let mut buffer=0xb8000 as *mut u8;
let mut y=0;
while y<(80 * 25 * 2){
let mut p=buffer.offset(y);
*buffer.offset(y)=0x00;
y=y+1;
*buffer.offset(y)=0x20;
y=y+1;
}
}
}

and compile: "cargo build --release"
Disassembling views this:

000000e0 <__ZN6memory5clean17h43e2ddeb8148e2edE>:
push %ebp
mov %esp,%ebp
movaps 0x0,%xmm0
xor %eax,%eax
nopl 0x0(%eax)
movups %xmm0,0xb8010(%eax,%eax,1)
movups %xmm0,0xb8000(%eax,%eax,1)
add $0x10,%eax
cmp $0x7d0,%eax
jne f0 <__ZN6memory5clean17h43e2ddeb8148e2edE+0x10>
pop %ebp
ret

It's not correct code.
But when I write

y=y+x; //x is argument equal 1

instead of

y=y+1;

it produce this asm:

000000e0 <__ZN6memory5clean17h43e2ddeb8148e2edE>:
push %ebp
mov %esp,%ebp
push %esi
mov 0x8(%ebp),%eax
mov $0xb8000,%ecx
xor %esi,%esi
lea 0xb8000(%eax),%edx
data16 data16 nopw %cs:0x0(%eax,%eax,1)
movb $0x0,(%ecx,%esi,1)
movb $0x20,(%edx,%esi,1)
add %eax,%esi
add %eax,%esi
cmp $0xfa0,%esi
jl 100 <__ZN6memory5clean17h43e2ddeb8148e2edE+0x20>
pop %esi
pop %ebp
ret

which is really correct.
Rustc version:

$ rustc -vV

rustc 1.22.1 (05e2e1c 2017-11-22)
binary: rustc
commit-hash: 05e2e1c
commit-date: 2017-11-22
host: i686-pc-windows-gnu
release: 1.22.1
LLVM version: 4.0

@gamozolabs
Copy link

gamozolabs commented Jan 13, 2018

This is likely because you are either doing an objdump on a .o file (which has not been linked) or you have incorrectly linked the binary.

When running objdump on the .o file, relocations have not been performed so the movaps instruction which is moving zero in objdump is actually supposed to be a load from memory. However since relocations have not been performed yet you observe that it is not a memory access but a zero. This is actually pretty misleading from objdump. It seems that objdump or AT&T syntax does not explicitly display that this is a memory access as the movaps has no immediate form, thus it just displays 0, which is the pre-relocation value for the memory address that it is loading from.

Here you can see a side-by-side of the disassembly of the object file in IDA (a more advanced disassembler) and objdump. You can see while they're both the same instruction "0f 28 05 xx 00 00 00", IDA correctly recognizes this as a memory load (and also performs the relocation hence the 0x6c instead of 0x00)
image

If you run objdump -r on your file you will see there is a relocation entry for the offset where the immediate value for the movaps instructions is.

This movaps actually is loading a memory array filled with the correct values to compile to exactly what you specified in your Rust code.

Further the reason the code changes when you do y+x instead of y+1 is simply due to the optimizer not able to know what that value is and thus it emits scalar code which is more portable. For example if y was a large number then it could no longer emit a vectorized/SSE version of the code.

-B

@aristarh2704
Copy link
Author

aristarh2704 commented Jan 13, 2018

But when I test this code in qemu (I try write my own OS), if I use "y+=1", execution fails and VM repeatedly reboots and then freezes. If I use "y+=x", execution ends successfully. This indicates a significant difference in the generated code.

@gamozolabs
Copy link

It's probably very likely that you did not initialize SSE in your OS and thus you're getting an undefined instruction exception on the SSE instruction. To enable SSE you should check bit 25 in CPUID.01h:EDX and if it is set you can proceed to clear the CR0.EM bit, set the CR0.MP bit, set the CR4.OSFXSR bit and set the CR4.OSXMMEXCPT bit.

That being said, most OSes do not use SSE and floats in the kernel (to save costs on context switches) and thus they build their kernels with flags such as -C target-feature=+soft-float to enable soft float (floating point emulation, disables use of SSE and FPU instructions).

@aristarh2704
Copy link
Author

Thank you for your help)

@aristarh2704
Copy link
Author

May I disable using SSE/MMX instruction in rustc?

@pnkfelix
Copy link
Member

@aristarh2704 I don't have much experience with disabling such instructions myself, but from the bug description at #26449, it sounds like you might be able to use the -C target-feature=... flag to disable a whole host of code generation features that are AFAIK solely controlled by LLVM.

@aristarh2704
Copy link
Author

How I can give this argument to cargo, not to rustc?

@pnkfelix
Copy link
Member

pnkfelix commented Jan 15, 2018

@aristarh2704 I'm not sure what current best practice is for configuring cargo that way (I do more manual invocation of rustc than setting up cargo projects), but it seems like one option is to use the rustflags configuration switch in your toml file:

I think this was added here: rust-lang/cargo#2241

and is documented here (scroll to where rustflags is mentioned): https://doc.rust-lang.org/cargo/reference/config.html#configuration-keys

But for current best practices, you may want to ask in the #cargo IRC channel.

@IsaacWoods
Copy link
Contributor

The correct solution here is to provide a custom target specification and then to use Xargo to recompile core with the correct options. In particular, you want to disable the sse and mmx features, to stop LLVM emitting problematic instructions.

A sample target file that does this might look like:

{
  "llvm-target": "x86_64-unknown-none",
  "data-layout": "e-m:e-i64:64-f80:128-n8:16:32:64-S128",
  "linker-flavor": "gcc",
  "target-endian": "little",
  "target-pointer-width": "64",
  "target-c-int-width": "32",
  "arch": "x86_64",
  "os": "none",
  "disable-redzone": true,
  "features": "-mmx,-sse,+soft-float"
}

Save this as my_target.json or whatever, then tell xargo to use it with: xargo --target=my_target

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants