Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow simd_bitmask to return byte arrays #88868

Merged
merged 5 commits into from
Nov 10, 2021

Conversation

calebzulawski
Copy link
Member

cc @rust-lang/project-portable-simd @workingjubilee

@rust-highfive
Copy link
Collaborator

r? @matthewjasper

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 11, 2021

// Convert the integer to a byte array
let ptr = bx.alloca(bx.type_ix(expected_bytes * 8), Align::ONE);
bx.store(ze, ptr, Align::ONE);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anyone more familiar with this than me: is there any reason this alignment or the one below must be greater than a byte?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, when these are handled by SSE or AVX512 instructions, they are normal integers of 16 to 64 bits. Even though we are saying we want to interpret this as a byte array, we should probably use the natural word-alignment of the processor: that way, kmovq mem, reg (reg to mem) naturally hits the alignment to pick it up in a GPR correctly, and if we for some reason are executing AVX512 features on a 32-bit processor, we still are correctly aligned for such (since then it would go into two GPRs).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think anything is preventing LLVM from using a greater alignment if it's more efficient, but the actual alignment is just 1, right? Regular u8 arrays are definitely used with larger alignments in most situations, but their alignment is still 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as you're coherent about this (I believe PlaceRef tracks it for you, so it may be worth trying to use it more), you can increase the alignment of the alloca (and then pass that alignment to the store/load - whether manually, or relying on the PlaceRef API, or letting LLVM infer it).

I'm not sure how useful it is here, but it may make LLVM more likely to generate the aligned versions of instructions that have both aligned and unaligned variants (assuming LLVM doesn't rewrite this cast away).

Also, if you're using allocas, please consider using place.storage_{live,dead}() before/after all of the memory operations on the alloca - again I'm unsure whether this is significant in the context (since e.g. a small wrapper that gets inlined will do the right thing itself during LLVM inlining), but the default without those (i.e. without llvm.lifetime.{start,end}) is every alloca adds to the stack frame size (for the entire function).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. At least hypothetically this should in fact be "a small wrapper that gets inlined". I don't think either of us is familiar with the PlaceRef API tho'. But that shouldn't be a blocking concern.

@calebzulawski
Copy link
Member Author

Forgot about simd_select_bitmask--just updated that too.

let r = simd_select_bitmask(0b0101000000001100u16, a, b);
assert_eq!(r.0, e);

let r = simd_select_bitmask([0b1100u8, 0b01010000u8], a, b);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this break on big-endian?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure how LLVM handles the bitcast. We've actually seen the opposite issue (bit order, not byte order) on BE: https://github.com/rust-lang/portable-simd/blob/master/crates/core_simd/src/masks/full_masks.rs#L118-L125

Copy link
Member

@programmerjake programmerjake Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bitcasts in llvm are supposed to be semantically equivalent to a memcpy iirc

The ‘bitcast’ instruction converts value to type ty2. It is always a no-op cast because no bits change with this conversion. The conversion is done as if the value had been stored to memory and read back as type ty2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try running tests on power64 (not power64le), it'll likely break

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inlined from conversation, relevant snippets from @programmerjake:

iirc the issue is that architectures have both a bit-level endianness and a byte-level endianness, both of which can independently vary
install qemu-user and use that to run tests†

† for ppc64

I am not as sure that bit-endianness should be a problem, however, per the LLVM LangRef: Vector Type:

In general vector elements are laid out in memory in the same way as array types. Such an analogy works fine as long as the vector elements are byte sized. However, when the elements of the vector aren’t byte sized it gets a bit more complicated. One way to describe the layout is by describing what happens when a vector such as is bitcasted to an integer type with N*M bits, and then following the rules for storing such an integer to memory.

I think the greater problem might be that it is not clear the i1 to integer bitcast was ever well-formed.

That said, the LLVM LangRef on bitcasts also expounds on vectors:

There is a caveat for bitcasts involving vector types in relation to endianess. For example bitcast <2 x i8> to i16 puts element zero of the vector in the least significant bits of the i16 for little-endian while element zero ends up in the most significant bits for big-endian.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I found a comment in a test that completely confirms @programmerjake's remarks, actually:

// ignore-endian-big behavior of simd_select_bitmask is endian-specific

🙃

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied this to the test. We can work out the endianness (and add tests if necessary) in another issue.

@apiraino apiraino added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Sep 13, 2021
@workingjubilee workingjubilee added the A-SIMD Area: SIMD (Single Instruction Multiple Data) label Sep 13, 2021
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 3, 2021
Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let r = simd_select_bitmask(0b0101000000001100u16, a, b);
assert_eq!(r.0, e);

let r = simd_select_bitmask([0b1100u8, 0b01010000u8], a, b);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inlined from conversation, relevant snippets from @programmerjake:

iirc the issue is that architectures have both a bit-level endianness and a byte-level endianness, both of which can independently vary
install qemu-user and use that to run tests†

† for ppc64

I am not as sure that bit-endianness should be a problem, however, per the LLVM LangRef: Vector Type:

In general vector elements are laid out in memory in the same way as array types. Such an analogy works fine as long as the vector elements are byte sized. However, when the elements of the vector aren’t byte sized it gets a bit more complicated. One way to describe the layout is by describing what happens when a vector such as is bitcasted to an integer type with N*M bits, and then following the rules for storing such an integer to memory.

I think the greater problem might be that it is not clear the i1 to integer bitcast was ever well-formed.

That said, the LLVM LangRef on bitcasts also expounds on vectors:

There is a caveat for bitcasts involving vector types in relation to endianess. For example bitcast <2 x i8> to i16 puts element zero of the vector in the least significant bits of the i16 for little-endian while element zero ends up in the most significant bits for big-endian.

@workingjubilee
Copy link
Member

odd.
r? @workingjubilee

@bors
Copy link
Contributor

bors commented Oct 7, 2021

☔ The latest upstream changes (presumably #89629) made this pull request unmergeable. Please resolve the merge conflicts.

@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 24, 2021
@JohnCSimon
Copy link
Member

triage: back to author for merge conflicts

@calebzulawski

This comment has been minimized.

@calebzulawski

This comment has been minimized.

@calebzulawski
Copy link
Member Author

@rustbot label -S-waiting-on-author +S-waiting-on-review

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 6, 2021
@workingjubilee
Copy link
Member

Hm.
I have thought about this and I am fine proceeding without an explicit test for big-endian, since we may as well be testing that issue in our high level API instead, but I would appreciate it if you documented that for simd_bitmask

  • currently, all tier 1 (and thus "little endian") platforms emit a predictable representation
  • but the exact representation for other platforms, especially big endian ones, is less clear, thus
  • in any case, the repr is platform-dependent as a result, and
  • should only be considered consistent with simd_select_bitmask

somewhere in either the associated tests or the codegen infrastructure itself.

@workingjubilee
Copy link
Member

I gave it a spin on std::arch's codegen tests and it seems fine. Let's see how it plays against CI.
@bors r+

@bors
Copy link
Contributor

bors commented Nov 10, 2021

📌 Commit fe8ae57 has been approved by workingjubilee

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 10, 2021
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 10, 2021
…askrgr

Rollup of 5 pull requests

Successful merges:

 - rust-lang#88447 (Use computed visibility in rustdoc)
 - rust-lang#88868 (Allow simd_bitmask to return byte arrays)
 - rust-lang#90727 (Remove potential useless data for search index)
 - rust-lang#90742 (Use AddAssign impl)
 - rust-lang#90758 (Fix collections entry API documentation.)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 858fea4 into rust-lang:master Nov 10, 2021
@rustbot rustbot added this to the 1.58.0 milestone Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-SIMD Area: SIMD (Single Instruction Multiple Data) S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants