-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Fixing Buffer::BlockCopy, JIT_MemCpy, and JIT_MemSet to just call the appropriate CRT functions for x64 Windows, as is already done for all other platforms/targets #25750
Conversation
I did a
Should we consider just removing the assembly helpers and the Edit: Fixed spacing in grep results |
JIT still uses these helpers, see: Lines 260 to 261 in dbc0e19
The CHECK_RANGE are not sanity checks, they are actually necessary. In case the JIT_MemCpy or JIT_MemSet get null reference as an argument, we need to handle that as if it happened in the managed caller. |
To use memcpy / memset from the CRT, we should do the same thing that we do on AMD64 Linux: coreclr/src/vm/amd64/crthelpers.S Lines 16 to 39 in dbc0e19
That way, we check the input parameters in the JIT_xxx functions and then tail call to the runtime functions. |
Fixed up the JIT_MemSet and JIT_MemCpy implementations to forward to the CRT versions. I also cleaned up the comments in the Unix version to match those in the x64 version (where applicable). |
src/vm/amd64/crthelpers.S
Outdated
|
||
jmp C_PLTFUNC(memcpy) | ||
jmp C_PLTFUNC(memmove) // forward to the CRT implementation, using memmove to handle overlapping buffers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the only code change to the Unix version. The Windows version was very explicit about needing to handle overlapping buffers, and so memmove
is the appropriate thing to forward to.
I would assume the same conditions apply to Unix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Windows version was very explicit about needing to handle overlapping buffers
That was for better compatibility with .NET Framework. It is not required by the ECMA 335: The behavior of cpblk is unspecified if the source and destination areas overlap.`
memmove is a bit slower. I do not think we need to be making the JIT helper slower on Unix...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need it for compat on Windows? I would think consistency here is also important.
(I'll keep memmove
on Windows and memcpy
on Unix for now, but I think it would be a good discussion to have).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If nothing else, it might be good to have JIT_MemCpy
and JIT_MemMove
, and then the JIT can more readily decide which is appropriate based on the scenario. There may be scenarios where overlapping is desirable and several where it is not needed.
Edit: And, it appears as though more than just Nevermind, looks like there were just some comments that said cpblk
goes through these.MemCpy
and they meant MemSet
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, memcpy as alias for memmove in Windows x64 CRT. So even if we have changed Windows to call memcpy, the unspecified behavior is still going to be different between Windows and Unix.
; void *dst = pointer to destination buffer | ||
; const void *src = pointer to source buffer | ||
; size_t count = number of bytes to copy | ||
; Exit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The declaration of these methods in jitinterface.h
have them as void returning, rather than as void*
returning (which is what the CRT implementations do):
Lines 434 to 435 in 01da3bd
void STDCALL JIT_MemSet(void *dest, int c, SIZE_T count); | |
void STDCALL JIT_MemCpy(void *dest, const void *src, SIZE_T count); |
CoreFX legs are failing due to:
@wtgodbe, was this something missed or are things just not entirely in sync yet? |
We may need to update some TFMs to |
@wtgodbe The runtimeconfig.json files are built with the corefx tests in the corefx official build, and downloaded by the corefx test legs in coreclr repo CI. Maybe the 3.0 versions here: https://github.com/dotnet/coreclr/blob/master/tests/src/Common/CoreFX/CoreFX.depproj need to be updated? |
…, as is already done for all other platforms/targets
…ations for JIT_MemSet and JIT_MemCpy
Tests aren't running... |
… framework compat.
/azp help |
Closed and reopened as per https://github.com/dotnet/corefx/issues/39593 |
… appropriate CRT functions for x64 Windows, as is already done for all other platforms/targets (dotnet/coreclr#25750) * Fixing Buffer::BlockCopy to just call the CRT memmove for x64 Windows, as is already done for all other platforms/targets * Fixing up the x64 CrtHelpers.asm to just forward to the CRT implementations for JIT_MemSet and JIT_MemCpy * Keep unix using memcpy and clarify that Windows uses memmove for full framework compat. Commit migrated from dotnet/coreclr@54d9d9b
This resolves https://github.com/dotnet/coreclr/issues/25505.
Based on the original issue where
JIT_MemCpy
was changed to userep movsb
(see #7198), there was:However, on AMD processors, there are additional limitations around
rep movsb
and when it is beneficial to use. The common conditions under which it is being used in theJIT_MemCpy
method today actually cause a 2x perf decrease for arrays larger than 512 bytes.Having a custom memcpy routine adds additional maintenance burden, can be error prone, is generally not as widely tested, and does not get many of the optimizations that the CRT implementations receives. This, coupled with the overall minor improvements for small arrays on Intel processors and the 2x regression for arrays over 512 bytes on AMD processors is resulting in the custom memcpy routine being removed.
It would be beneficial for any future improvements to
memcpy
to be made directly againstglibc
andcrt
instead.