Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift: Generate load/store using AMode::RegScaled on aarch64 #6742

Closed
maekawatoshiki opened this issue Jul 18, 2023 · 2 comments · Fixed by #6945
Closed

Cranelift: Generate load/store using AMode::RegScaled on aarch64 #6742

maekawatoshiki opened this issue Jul 18, 2023 · 2 comments · Fixed by #6945

Comments

@maekawatoshiki
Copy link
Contributor

maekawatoshiki commented Jul 18, 2023

Feature

Currently, on aarch64 backend, the following piece of CLIF instructions...

; Equivalent to: int64_t *v9; int64_t v10; v4 = v9[v10];
v1 = iconst.i64 3
v2 = ishl.i64 v10, v1  ; v1 = 3
v3 = iadd v9, v2
v4 = load.i64 v3

... will generate the assembly like below:

adrp    x4, 0x780000
ldr     x4, [x4]
lsl     x5, x3, #3
ldr     x4, [x4, x5]

However, the assembly can be converted into more efficient one like this:

adrp    x4, 0x780000
ldr     x4, [x4]
ldr     x4, [x4, x3, lsl #3]

Benefit

The shorter instruction sequence will help improve the performance.
In fact, this problem was found when I was diffing the assembly generated by cranelift and llvm, where llvm was around 10% faster than cranelift in my case.

Implementation

I've walked through the cranelift codebase and figured out that such addressing mode seems to be represented as AMode::RegScaled, but not sure how I can teach the code generator to use RegScaled for ldr.
Editing isle rules or something like that?

@maekawatoshiki maekawatoshiki changed the title Cranelift: Generate load/store with AMode::RegScaled on aarch64 Cranelift: Generate load/store using AMode::RegScaled on aarch64 Jul 18, 2023
@cfallin
Copy link
Member

cfallin commented Jul 18, 2023

This would be great to have!

In fact, we even have a TODO in the code already (but unfortunately it looks like we didn't file an issue at the time, sorry!)

As can be seen at that link, we actually never translated the lower_address implementation to ISLE, so it's still done with manual pattern-matching in Rust. The ideal longer-term solution would be to rework it into ISLE, as we have on x64, which would make additional pattern-matching like this easier. I think we'd prefer that, but it's also a bit more work.

It might be possible to shoehorn it into the manual Rust code above, but it's a little tricky: the code works by collecting addends32 and addends64, with the semantics that the address is the sum of all of those (with 32-bit values zero-extended). We'd have to have a separate shifted (with a type something like Option<(Reg, u8)> to keep a value-in-register and a scale), collect it with the addends, and incorporate it if possible. But that's rapidly getting more complex than the equivalent in ISLE, so I think it's probably not the best approach.

@maekawatoshiki
Copy link
Contributor Author

maekawatoshiki commented Jul 18, 2023

Thank you for your quick reply.

I agree that it gets more complicated if we change the current lower_address implementation, but translating lower_address entirely to ISLE is also a hard work.

I'll try to translate a part of lower_address into ISLE, and hopefully make a PR.
I've just noticed that it seems to be impossible to partially translate lower_address into ISLE, due to (extern constructor amode amode) ...

alexcrichton added a commit to alexcrichton/wasmtime that referenced this issue Aug 31, 2023
This commit adds a few cases to `amode` construction on AArch64 for
using the `RegScaled*` variants of `AMode`. This won't affect wasm due
to this only matching the sign-extension happening before the shift, but
it should otherwise help non-wasm Cranelift use cases.

Closes bytecodealliance#6742
github-merge-queue bot pushed a commit that referenced this issue Aug 31, 2023
This commit adds a few cases to `amode` construction on AArch64 for
using the `RegScaled*` variants of `AMode`. This won't affect wasm due
to this only matching the sign-extension happening before the shift, but
it should otherwise help non-wasm Cranelift use cases.

Closes #6742
pchickey pushed a commit that referenced this issue Sep 1, 2023
…6950)

* Enhance `async` configuration of `bindgen!` macro (#6942)

This commit takes a leaf out of `wiggle`'s book to enable bindings
generation for async host functions where only some host functions are
async instead of all of them. This enhances the `async` key with a few
more options:

    async: {
        except_imports: ["foo"],
        only_imports: ["bar"],
    }

This is beyond what `wiggle` supports where either an allow-list or
deny-list can be specified (although only one can be specified). This
can be useful if either the list of sync imports or the list of async
imports is small.

* cranelift-interpreter: Fix SIMD shifts and rotates (#6939)

* cranelift-interpreter: Fix SIMD `ishl`/`{s,u}`shr

* fuzzgen: Enable a few more ops

* cranelift: Fix tests for {u,s}shr

* fuzzgen: Change pattern matching arms for shifts

Co-Authored-By: Jamey Sharp <[email protected]>

---------

Co-authored-by: Jamey Sharp <[email protected]>

* Partially revert CLI argument changes from #6737 (#6944)

* Partially revert CLI argument changes from #6737

This commit is a partial revert of #6737. That change was reverted
in #6830 for the 12.0.0 release of Wasmtime and otherwise it's currently
slated to get released with the 13.0.0 release of Wasmtime. Discussion
at today's Wasmtime meeting concluded that it's best to couple this
change with #6925 as a single release rather than spread out across
multiple releases. This commit is thus the revert of #6737, although
it's a partial revert in that I've kept many of the new tests added to
showcase the differences before/after when the change lands.

This means that Wasmtime 13.0.0 will exhibit the same CLI behavior as
12.0.0 and all prior releases. The 14.0.0 release will have both a new
CLI and new argument passing semantics. I'll revert this revert (aka
re-land #6737) once the 13.0.0 release branch is created and `main`
becomes 14.0.0.

* Update release notes

* riscv64: Use `PCRelLo12I` relocation on Loads (#6938)

* riscv64: Use `PCRelLo12I` relocation on Loads

* riscv64: Strenghten pattern matching when emitting Load's

* riscv64: Clarify some of the load address logic

* riscv64: Even stronger matching

* Update Rust in CI to 1.72.0, clarify Wasmtime's MSRV (#6900)

* Update Rust in CI to 1.72.0

* Update CI, tooling, and docs for MSRV

This commit codifies an MSRV policy for Wasmtime at "stable minus two"
meaning that the latest three releases of Rust will be supported. This
is enforced on CI with a full test suite job running on Linux x86_64
with the minimum supported Rust version. The full test suite will use
the latest stable version. A downside of this approach is that new
changes may break MSRV support on non-Linux or non-x86_64 platforms and
we won't know about it, but that's deemed a minor enough risk at this
time.

A minor fix is applied to Wasmtime's `Cargo.toml` to support Rust 1.70.0
instead of requiring Rust 1.71.0

* Fix installation of rust

* Scrape MSRV from Cargo.toml

* Cranelift is the same as Wasmtime's MSRV now, more words too

* Fix a typo

* aarch64: Use `RegScaled*` addressing modes (#6945)

This commit adds a few cases to `amode` construction on AArch64 for
using the `RegScaled*` variants of `AMode`. This won't affect wasm due
to this only matching the sign-extension happening before the shift, but
it should otherwise help non-wasm Cranelift use cases.

Closes #6742

* cranelift: Validate `iconst` ranges (#6850)

* cranelift: Validate `iconst` ranges

Add the following checks:

`iconst.i8`  immediate must be within 0 .. 2^8-1
`iconst.i16` immediate must be within 0 .. 2^16-1
`iconst.i32` immediate must be within 0 .. 2^32-1

Resolves #3059

* cranelift: Parse `iconst` according to its type

Modifies the parser for textual CLIF so that V in `iconst.T V` is
parsed according to T.

Before this commit, something like `iconst.i32 0xffff_ffff_ffff` was
valid because all `iconst` were parsed the same as an
`iconst.i64`. Now the above example will throw an error.

Also, a negative immediate as in `iconst.iN -X` is now converted to
`2^N - X`.

This commit also fixes some broken tests.

* cranelift: Update tests to match new CLIF parser

* Some minor fixes and features for WASI and sockets (#6948)

* Use `command::add_to_linker` in tests to reduce the number of times
  all the `add_to_linker` are listed.
* Add all `wasi:sockets` interfaces currently implemented to both the
  sync and async `command` functions (this enables all the interfaces in
  the CLI for example).
* Use `tokio::net::TcpStream::try_io` whenever I/O is performed on a
  socket, ensuring that readable/writable flags are set/cleared
  appropriately (otherwise once readable a socket is infinitely readable).
* Add a `with_ambient_tokio_runtime` helper function to use when
  creating a `tokio::net::TcpStream` since otherwise it panics due to a
  lack of active runtime in a synchronous context.
* Add `WouldBlock` handling to return a 0-length read.
* Add an `--inherit-network` CLI flag to enable basic usage of sockets
  in the CLI.

This will conflict a small amount with #6877 but should be easy to
resolve, and otherwise this targets different usability points/issues
than that PR.

---------

Co-authored-by: Afonso Bordado <[email protected]>
Co-authored-by: Jamey Sharp <[email protected]>
Co-authored-by: Timothée Jourde <[email protected]>
eduardomourar pushed a commit to eduardomourar/wasmtime that referenced this issue Sep 6, 2023
This commit adds a few cases to `amode` construction on AArch64 for
using the `RegScaled*` variants of `AMode`. This won't affect wasm due
to this only matching the sign-extension happening before the shift, but
it should otherwise help non-wasm Cranelift use cases.

Closes bytecodealliance#6742
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants