Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm64: alternative implementation of atomics using CAS #164

Merged
merged 1 commit into from
May 2, 2023

Conversation

carenas
Copy link
Contributor

@carenas carenas commented Apr 25, 2023

Works only on CPUs that implement LSE, and which is available since armv8.1-a (--march=armv8-a+lse), like the M1.

First patch implement the feature, last one is just leftover code that might be useful while debugging and therefore optional

@zherczeg
Copy link
Owner

"atomic_store doesn't incorrectly set the atomic value if it changes while in the critical zone" - based on the discussion in #160 we should simply say, that any store between the load/store makes the operation undefined.

"the wrong value is provided as "expected" through temp_reg" - this already makes the operation undefined

There should be a test for reading data between the load/store. There should be a test for op_flags as well (an existing test can be extended though).

The atomic load/store must satisfy many rules, maybe creating a list of them would look better.

Do you want to work on this or shall I do it?

@carenas
Copy link
Contributor Author

carenas commented Apr 26, 2023

"atomic_store doesn't incorrectly set the atomic value if it changes while in the critical zone" - based on the discussion in #160 we should simply say, that any store between the load/store makes the operation undefined.

OK, but it should be important to note that might apply to both read and write operations, and not only to the memory address, but also its surrounding area.

Indeed, really old ARM cpu's which luckily we don't support, would invalidate locks if any write was done anywhere in memory[1]

"the wrong value is provided as "expected" through temp_reg" - this already makes the operation undefined

That needs clarification in the documentation as well, note that the fact that ARM, and any other
implementation that uses LL/SC would just happily ignore that the value doesn't match results in the store incorrectly succeeding as shown by the test, and which is why it is a bug against the specification that says that if the value doesn't match then the memory is left unchanged.

I would also argue that calling it "temp_reg" is deceiving, and should be instead "state_reg" or something that better identifies it's actual use.

There should be a test for reading data between the load/store. There should be a test for op_flags as well (an existing test can be extended though).

I would instead argue there should be no reads, so that there are no more surprises when all other CPUs are implemented.

Do you want to work on this or shall I do it?

Considering I might seem to be still confused on how this is to be used, it might be better if you do it.

[1] https://en.wikipedia.org/wiki/Load-link/store-conditional

@zherczeg
Copy link
Owner

OK, but it should be important to note that might apply to both read and write operations, and not only to the memory address, but also its surrounding area.

I would not limit it to an area. All memory writes, regardless of the address, makes the operation undefined. I cannot think any practical uses, where writing cannot be postponed after the store.

I would also argue that calling it "temp_reg" is deceiving, and should be instead "state_reg" or something that better identifies it's actual use.

The comment says: temp_reg must contain the value loaded into dst_reg by the sljit_emit_atomic_load operation (the store operation does not preserve the value of temp_reg)

I think "temp" emphasize that its value is not preserved, but I am open to any ideas. I don't like the "state" term, since it does not contain any status.

I would instead argue there should be no reads, so that there are no more surprises when all other CPUs are implemented.

This is certainly possible, but it would be strange, if reads would block a memory store.

@carenas carenas marked this pull request as draft April 27, 2023 03:50
test_src/sljitTest.c Outdated Show resolved Hide resolved
sljit_src/sljitNativeARM_64.c Outdated Show resolved Hide resolved
@carenas carenas changed the title test92: add negative tests (disabled for arm) arm64: alternative implementation of atomics using CAS Apr 27, 2023
test_src/sljitTest.c Outdated Show resolved Hide resolved
test_src/sljitTest.c Outdated Show resolved Hide resolved
test_src/sljitTest.c Outdated Show resolved Hide resolved
test_src/sljitTest.c Outdated Show resolved Hide resolved
test_src/sljitTest.c Show resolved Hide resolved
@carenas carenas force-pushed the noarm branch 3 times, most recently from b672667 to c8d1db9 Compare April 27, 2023 22:34
@carenas carenas marked this pull request as ready for review April 27, 2023 22:37
test_src/sljitTest.c Outdated Show resolved Hide resolved
sljit_emit_atomic_load(compiler, SLJIT_MOV_U32, SLJIT_R2, SLJIT_S2);
sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R0, 0, SLJIT_R2, 0);
sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_S3, 0, SLJIT_R2, 0);
sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R1, 0, SLJIT_IMM, 987654321);
sljit_emit_op1(compiler, SLJIT_MOV_U32, SLJIT_R1, 0, SLJIT_IMM, 987654321);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing this should be elsewhere.

test_src/sljitTest.c Outdated Show resolved Hide resolved
@carenas carenas force-pushed the noarm branch 4 times, most recently from 8c15dd4 to 09274b0 Compare April 29, 2023 13:04
@carenas carenas marked this pull request as draft April 29, 2023 13:05
@carenas carenas force-pushed the noarm branch 2 times, most recently from fcab634 to 70f906c Compare April 29, 2023 23:35
@carenas carenas marked this pull request as ready for review April 29, 2023 23:36
test_src/sljitTest.c Outdated Show resolved Hide resolved
test_src/sljitTest.c Outdated Show resolved Hide resolved
sljit_src/sljitNativeARM_64.c Outdated Show resolved Hide resolved
sljit_src/sljitNativeARM_64.c Outdated Show resolved Hide resolved
sljit_src/sljitNativeARM_64.c Outdated Show resolved Hide resolved
@zherczeg
Copy link
Owner

zherczeg commented May 2, 2023

I think the patch is quite nice now!

You can add /**/ comments freely anywhere to describe things, for me that is better than DEVELOPER.

I realized one type of test is missing: the double load.

  • Create a static global variable, should be far from the stack allocated buffer
  • Load the global value with atomic load
  • Load a local value value with atomic load
  • Store a local value value with atomic store
    This should work. A missing store should have no side effects.

Only available with ARMv8.1 or higher, so behind a flag that would
be enabled by an ACE compatible compiler depending on -march.

As a sideeffect, avoids the races that could result in an infloop
with M1.
@carenas
Copy link
Contributor Author

carenas commented May 2, 2023

At least with ARM's LL/SC, a second load invalidates the reservation of the first, which IMHO would be also in line with the constrain to only support one atomic per thread that was documented.

@carenas carenas requested a review from zherczeg May 2, 2023 09:54
Copy link
Owner

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for doing all the changes

@zherczeg zherczeg merged commit 07a5561 into zherczeg:master May 2, 2023
@carenas carenas deleted the noarm branch May 2, 2023 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants