Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a method similar to S.R.CS.RuntimeHelpers.InitializeArray, but for spans #24961

Closed
VSadov opened this issue Feb 7, 2018 · 44 comments
Closed
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory
Milestone

Comments

@VSadov
Copy link
Member

VSadov commented Feb 7, 2018

Initialization of literal arrays like new int[]{1,2,3,4,5,6} has a special path where the blittable data is stored directly in the PE data and at runtime, instead of assigning every element of the array to a corresponding constant, we call InitializeArray with the target array instance and the token for the field that represents the data blob.

Similar technique would be very useful when initializing spans.
We already have two scenarios:

  • stack allocated spans - stackalloc int[] {1,2,3,4,5}
  • optimized conversion of literal arrays to ReadOnlySpan - (ReadOnlySpan<int>)new int[]{1,2,3,4,5}

in fact in the case with ReadOnlySpan conversion, it would be possible and desirable to just refer to the PE data directly.

The preferred form of the API would be:

        ReadOnlySpan<T> GetGetReadOnlySpanFromTemplate<T>(RuntimeFieldHandle fldHandle);

Another acceptable alternative is:
(but the one above feels more convenient since verifiability is less of a problem)

        ref T GetRefToTemplateData<T>(RuntimeFieldHandle fldHandle);

A valid question to be asked here - "if it is possible to just load a reference to the field in the first place, why there is a need for the API?"

The problem is that the blob data is always stored in littleendian format, so on a bigendian machine the blob data is valid only for 1-byte sized elements.
Similarly to the case of InitializeArray, this API would allow the runtime to abstract away the endianness of the blob.
In a littleendian context (which is the most common case) the implementation could trivially forward to the field data and in bigendian case it may do fixups by either making a copy of the data while changing the endianness or even by performing the fixup in-place.

NOTE: possibility of in-place fixup would require that the same blob is not used to initialize span data of different sizes - say shorts and longs.
Such restriction would be acceptable on the C# side and runtime could validate that such "sharing" did not happen, or make it undefined behavior if that happens.

NOTE: the presence of the API is statically known to the compiler, so it would be ok if some runtimes do not have it right away or never. Then optimization will simply not consider 1+ element sizes.

@VSadov
Copy link
Member Author

VSadov commented Feb 7, 2018

This is related to:
dotnet/roslyn#24621
dotnet/roslyn#24249

@VSadov
Copy link
Member Author

VSadov commented Feb 7, 2018

@VSadov
Copy link
Member Author

VSadov commented Feb 8, 2018

Having this in .NET Core 2.1 would be very useful.
Is there any chance for this API making it though at this point?

@alrz
Copy link
Member

alrz commented Feb 8, 2018

how about stackalloc on pointers? e.g. int* p = stackalloc[] { 1,2,3 };

@VSadov
Copy link
Member Author

VSadov commented Feb 8, 2018

@alrz - the resulting type does not matter here. From the bigendian point of view we store int32 elements with their bytes in a wrong order. The data will not look like consecutive 1,2,3 on bigendian machine.

@alrz
Copy link
Member

alrz commented Feb 8, 2018

I understand that, but we're going to use this for stackalloc initializers on pointers as well, I guess the second alternative could work though ref T GetRefToTemplateData<T>(RuntimeFieldHandle fldHandle);

In fact, stackalloc initializers "only" work on pointer types, no special codegen for Span. We're just passing an already initialized localloc to the Span constructor.

@VSadov
Copy link
Member Author

VSadov commented Feb 8, 2018

@alrz - right, GetRefToTemplateData would work equally well for stackalloc that results in a pointer or in a span.

@VSadov
Copy link
Member Author

VSadov commented Feb 8, 2018

It is fairly easy to get a pointer from a span. For example “(void*)&s[0]” in IL

@alrz
Copy link
Member

alrz commented Feb 8, 2018

@VSadov I just don't see why there has be an indirection here, we can construct both Span and ReadOnlySpan directly from a pointer (or ref for that matter), but GetReadOnlySpanFromTemplate needs a conversion for anything other than ReadOnlySpan

@VSadov
Copy link
Member Author

VSadov commented Feb 8, 2018

One reason is that ReadOnlySpan can be used safely in the ReadOnlySpan case, which would not require dealing with pointers, and would be verifiable.

Since span is basically a range-checking reference is is safer to use. Conceptually, it is the right thing to represent a chunk of data with known length.

We will see though what CLR/FX guys will say when it gets to actually implementing that API.

@alrz
Copy link
Member

alrz commented Feb 8, 2018

currently none of Span tests are verifiable due to the unsafe nature of the constructor... I am more biased towards the current implementation of the stackalloc inits, I think that approach need more changes compared to just returning a ref here. We should see how things work out in either of these cases.

@KrzysztofCwalina
Copy link
Member

@VSadov, what's the conclusion here? Do we need this API? If yes, please mark the issue are ready for review and let's discuss it asap.

@VSadov
Copy link
Member Author

VSadov commented Feb 23, 2018

We need this API to implement optimizations.
It is not strictly needed - it is not blocking, but it is a "very good to have" one.

Considering that this will have to be implemented as JIT intrinsic and be platform specific, I am very doubtful that it can make it for 7.3

Lets put it for the review though.

@terrajobst
Copy link
Member

@jkotas what are your thoughts on this?

@jkotas
Copy link
Member

jkotas commented Feb 27, 2018

This method is Span equivalent of the existing RuntimeHelpers.InitializeArray method. It is expected to be only ever be called by C# compiler generated code, never called directly by user. The proposed design (GetReadOnlySpanFromTemplate) looks fine to me.

The end-to-end story for this requires changes in C# compiler, runtime, JIT, and potentially debugger expression evaluator to come together. We should do this post .NET Core 2.1 when we have a runway for doing the work and testing that it all fits together well.

@ahsonkhan
Copy link
Member

The end-to-end story for this requires changes in C# compiler, runtime, JIT, and potentially debugger expression evaluator to come together. We should do this post .NET Core 2.1 when we have a runway for doing the work and testing that it all fits together well.

Moving to future.

@terrajobst
Copy link
Member

How about this:

class RuntimeHelpers
{
	ReadOnlySpan<T> CreateSpan<T>(RuntimeFieldHandle fldHandle);	
}

@benaadams
Copy link
Member

Use unmanaged constraint, so it can't refer to objects or types containing object?

class RuntimeHelpers
{
    ReadOnlySpan<T> CreateSpan<T>(RuntimeFieldHandle fldHandle) where T : unmanaged;	
}

@ektrah
Copy link
Member

ektrah commented Jun 12, 2018

How expensive is that method? Is it possible to cache the return value somehow if it’s expensive?

@VSadov
Copy link
Member Author

VSadov commented Jun 12, 2018

The following looks good.

class RuntimeHelpers
{
    ReadOnlySpan<T> CreateSpan<T>(RuntimeFieldHandle fldHandle)	
}

Not sure if unmanaged would be very useful. T must be blittable - in a sense that an array of bytes can be trivially mapped to T[]. unmanaged structs may still get their fields aligned, packed, reordered. May contain IntPtr...

It basically means that T must be a primitive value type of a known size like int, char or an enum. Runtime will need to validate that.

@ektrah Yes, runtime will have to cache the result if it did any transformations to the data - to correct endianness or alignment. The original field is technically accessible from the user code, so the transformation, if needed, cannot be done in-place.
If no changes are needed, it would make sense to just wrap the field in a span and return. This should generally be a case on little endian machines..

@terrajobst
Copy link
Member

terrajobst commented Jun 26, 2018

Video

Alright, this is it:

class RuntimeHelpers
{
	ReadOnlySpan<T> CreateSpan<T>(RuntimeFieldHandle fldHandle);	
}

@jaredpar
Copy link
Member

Why not put it on MemoryMarshal? This where a number of unsafe Span helpers already exist. Prefer we keep all of those in one place so the entirety of the feature is easy to rationalize about.

@jkotas
Copy link
Member

jkotas commented Jun 26, 2018

It is Span equivalent of InitializeArray that lives on RuntimeHelpers. There is nothing unsafe about either InitializeArray nor CreateSpan.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@huoyaoyuan
Copy link
Member

Is this still required since the compiler has used ldflda for a long time?

@brian-reichle
Copy link

@huoyaoyuan, the compiler is only able to do that where the element size is 1 byte, anything larger would require adjusting for potential differences in endianness. The compiler uses RuntimeHelpers.InitializeArray to make the necessary adjustments when initializing an array, I believe this issue is about getting something similar for ReadOnlySpan<> so that the compiler can avoid the need to create an array when the element size > 1 byte.

https://sharplab.io/#v2:EYLgHgbALANALiAlgGwD4AEBMBGAsAKHQGYACLEgYRIG8CT6zSAlAUwEMATAeQDtkBPAMoAHNjwA8wfnBYA+EgFkAFAEoSAXnk8WAdxJSZAbQC6NEgAYYJbFcxWiV2CQCsViCQC+AbjoNiJVk5eARExcUQeOHkAOVUNLV0SCLgTM0trW3tHK1cSd28CDyA==

@huoyaoyuan
Copy link
Member

Some question found while implementing it.

The storage of RVA field looks to be endianess aware, and InitializeArray has code to handle them.
Under LE runtime, returning the address of RVA field directly is OK. This also works for bytes.
Under BE runtime, reversing of endianess should be required. But where to store the reversed data? Creating an array every time regresses the performance improvements totally.

@jkotas
Copy link
Member

jkotas commented Oct 18, 2020

In CoreCLR, it would need to allocate the copy with the right endianness on the loader heap. It is not that important to implement the big-endian support for CoreCLR. CoreCLR does not run on big-endian systems today and there are number of issues that would need to be fixed before it can.

In Mono, implementing the endian swapping for Mono is more important. Mono runs on big-endian systems today. @lambdageek Could you please provide guidance for Mono?

@huoyaoyuan
Copy link
Member

It looks like that the reversed form depends on the type of T, so it may cause some architectural issue, although every field is supposed to have only one usage in practice.

@stephentoub
Copy link
Member

@jaredpar, if the runtime were to expose this soon, would C# be able to target it for .NET 6? I expect there'd be a bit of a back and forth to get all the ducks in a row, e.g. runtime exposes the API, C# takes advantage of it, runtime updates all places that could utilize it to do so (e.g. changing some internal array static fields to be span static props).

@jaredpar
Copy link
Member

Seems pretty reasonable and probably low / med-low costing.

Curious: how much benefit is this expected to provide here?

@lambdageek
Copy link
Member

lambdageek commented Nov 23, 2020

In CoreCLR, it would need to allocate the copy with the right endianness on the loader heap. It is not that important to implement the big-endian support for CoreCLR. CoreCLR does not run on big-endian systems today and there are number of issues that would need to be fixed before it can.

In Mono, implementing the endian swapping for Mono is more important. Mono runs on big-endian systems today. @lambdageek Could you please provide guidance for Mono?

I think for Mono it will be very similar to how ves_icall_System_Runtime_CompilerServices_RuntimeHelpers_InitializeArray works today, with the following changes:

  1. for LE we can just return a span over the result of mono_field_get_rva
  2. for BE, we may need to add a second field to MonoFieldDefaultValue something like char *host_data; /* RVA swapped to host endianness using the type of MonoClassField* for sizes */
  3. in mono_field_get_rva on BE we would allocate from the class mempool and write the byte-swapped data to host_data if it's not already initialized.
  4. (extra credit) for AOT it would be nice to store the byte-swapped data in the AOT image and refer to that instead of doing a copy at runtime. I don't think this will be easy.

@stephentoub
Copy link
Member

how much benefit is this expected to provide here?

@VSadov, can you comment on this, and how the costs would compare to just using a static T[] field?

@VSadov
Copy link
Member Author

VSadov commented Nov 23, 2020

An interesting case is (ReadOnlySpan<int>)new int[]{1,2,3,4,5} - on little-endian machine that can be a direct reference to the metadata without copying or allocating anything. We already can do this as long as the element type is 1-byte size. I am not sure how commonly this pattern is used, since requiring 1-byte elements seems quite limiting.

Ultimately, compared to wrapping a static array this saves the allocation and initialization of that array. I would say the savings are not huge.
The issue was logged with an assumption that supporting this might be fairly cheap too, especially on little-endian HW, which is nowdays almost everything.

If there are other reasons why this would require nontrivial work or extra copy even on little-endian (for example due to alignment requirements), then it might not be worth it.

@stephentoub
Copy link
Member

Ultimately, compared to wrapping a static array this saves the allocation and initialization of that array.

What about after the array has already been initialized? This API for span will need to be called on every access, right? How does that compare to the cost of accessing the static array field?

@jkotas
Copy link
Member

jkotas commented Nov 23, 2020

This API would have to be implemented as JIT intrinsic that turns it into address constant.

The current InitializeArray is also implemented as JIT intrinsic, for similar reason.

@VSadov
Copy link
Member Author

VSadov commented Nov 23, 2020

If the API takes a handle, there could be some extra work with figuring the location of the field, unless it is an intrinsic that does it at JIT-time.

Alternatively the API may also take a ref to the field (C# compiler could provide it), then it might be possible to just use the ref and make it roughly the same as wrapping a static field (on little endian machine).

@VSadov
Copy link
Member Author

VSadov commented Nov 23, 2020

@jkotas - are there alignment guarantees for metadata blobs?

@jkotas
Copy link
Member

jkotas commented Nov 23, 2020

It is up to IL producers to guarantee the alignment for RVA statics.

For example, managed C++ does emit the RVA statics with the right alignment. The IL rewriters (at least the ones we own - e.g. crossgen) do preserve it.

I think we can make the API throw when the blob is not sufficiently aligned.

@stephentoub
Copy link
Member

Then in terms of impact, assuming it's as-fast-or-faster to access one of these, there are a bunch of places we'd used them, just as there are a bunch of places we used the support that was added for ReadOnlySpan<OneBytePrimitive>. A quick survey suggests there are dozens of places we'd use this with private static readonly char[] and private static readonly int[] fields, where we could avoid the array allocation / copy and get potentially faster access to boot, with access sites either just indexing into them or using them with APIs that support spans.

@MichalStrehovsky
Copy link
Member

An interesting case is (ReadOnlySpan)new int[]{1,2,3,4,5} - on little-endian machine that can be a direct reference to the metadata without copying or allocating anything

How does returning pointers into data section of the executable mix with unloadability? Would we need to make a copy of the data if the assembly is part of an unloadable load context? I assume there's no good way to track the reference within the span.

@VSadov
Copy link
Member Author

VSadov commented Nov 24, 2020

In coreclr span is similar to a byref parameter in terms of GC tracking. It should keep the context alive.
Spans are also stack-only and thus relatively short-lived, so keeping the context alive for too long is unlikely to be a problem.

@jkotas
Copy link
Member

jkotas commented Nov 24, 2020

In coreclr span is similar to a byref parameter in terms of GC tracking

#40346

@jkotas
Copy link
Member

jkotas commented Oct 27, 2021

Closed in favor of #60948

@teo-tsirpanis
Copy link
Contributor

@jkotas it's still open.

@jkotas jkotas closed this as completed Oct 29, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Nov 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory
Projects
None yet
Development

No branches or pull requests