This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Idea for partial TSO cost mitigation/pgo with MTE extensions #1731
Labels
Milestone
You can continue the conversation there. Go to discussion →
Following up from yesterday's discussion,
Overview
Using
MTE
(https://www.kernel.org/doc/html/latest/arm64/memory-tagging-extension.html) which tracks a 4-bit tag per 16 bytes of memory we can mark every 16 bytes of memory as either beinglocal
to a cpu core, thusnon-tso
accesses are fine, or asshared
, thus accesses need to betso
. MTE gives us 4 bits, so we can track up to 15 cpu cores plus 1 id used forshared
memory.The
rseq
(https://www.efficios.com/blog/2019/02/08/linux-restartable-sequences/) kernel feature can be used to detect when a task is switched between cpu cores.We can then detect
non-tso
accesses toshared
andtso
accesses tolocal
memory with a synchronous tag mismatch exception, and backpatch the offending instruction to use slower, tso instructions.The optimisation depends on
local
/shared
memory access being a characteristic of the specific memory operation. While this is overall likely to be true, functions like memcpy will have to deal with bothshared
andlocal
memory.There are several limitations to this approach, however it should still be useful for instrumentation and pgo.
Limitations
tso
andnon-tso
access modes. I'm not sure there's a good workaround for that, as I don't think we can allow TSO ops to work on bothlocal
andshared
memory using MTEPROT_MTE is only supported on MAP_ANONYMOUS and RAM-based file mappings (tmpfs, memfd).
shared
memory accesses, not allshared
memory accesses need to be tso. It is impossible to detectshared
accesses that don't need to betso
using this approach.Variations
Details
Setup
HostCoreId
as such that they are either [0,14] with 15 reserved forshared
memory, or as [1, 15] with 0 reserved forshared
memory.frame->HostCoreId
rHostCoreId
rHostCoreId
fromframe->HostCoreId
on every re-enter to the jit abiframe->HostCoreId
in sync with the currentHostCoreId
. An alternative is to read from the rseq core id field.rHostCoreId
on cpu core migration if the code is in the jit abiAssuming 0 is used to indicate
shared
memorylocal
memopsshared
memopslocal
->shared
migration & backpatchingSIGSEGV
with.si_code = SEGV_MTESERR
where the offending memop is anon-tso
memop.si_addr
TAG to 0tso
memopshared
->local
migration & backpatchingSIGSEGV
with.si_code = SEGV_MTESERR
where the offending memop is atso
memop.si_addr
TAG toCoreId
non-tso
memopThe text was updated successfully, but these errors were encountered: