Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Embrace spans in System.Reflection.Metadata. #85280

Open
teo-tsirpanis opened this issue Apr 24, 2023 · 10 comments
Open

[API Proposal]: Embrace spans in System.Reflection.Metadata. #85280

teo-tsirpanis opened this issue Apr 24, 2023 · 10 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Reflection.Metadata
Milestone

Comments

@teo-tsirpanis
Copy link
Contributor

teo-tsirpanis commented Apr 24, 2023

Background and motivation

System.Reflection.Metadata is a pretty perormance-oriented library but its API lacks methods that accept spans. I propose additional span overloads for methods that work with memory buffers. Some of these APIs can now be implemented efficiently thanks to the work started in #81059.

API Proposal

namespace System.Reflection.Metadata;

public class BlobBuilder
{
    public void WriteBytes(ReadOnlySpan<byte> buffer);
}

public class BlobContentId
{
    public BlobContentId(ReadOnlySpan<byte> id);
    public static BllobContentId FromHash(ReadOnlySpan<byte> hashCode);
}

public struct BlobReader
{
    public void ReadBytes(Span<byte> buffer);
    public void ReadUTF8(Span<char> buffer);
    public bool TryReadUTF8(Span<char> buffer, out int charsWritten);
    public void ReadUTF16(Span<char> buffer);
    public bool TryReadUTF16(Span<char> buffer, out int charsWritten);
}

public struct BlobWriter
{
    public void WriteBytes(ReadOnlySpan<byte> buffer);
}

public struct MetadataReader
{
    public int GetBlobLength(BlobHandle handle);
    public void GetBlobBytes(BlobHandle handle, Span<byte> buffer, out int bytesWritten);
    public int GetStringLength(StringHandle handle);
    public void GetStringChars(StringHandle handle, Span<char> buffer, out int charsWritten);
    public int GetStringLength(NamespaceDefinitionHandle handle);
    public void GetStringChars(NamespaceDefinitionHandle handle, Span<char> buffer, out int charsWritten);
    public int GetStringLength(DocumentNameBlobHandle handle);
    public void GetStringChars(DocumentNameBlobHandle handle, Span<char> buffer, out int charsWritten);
    public int GetUserStringLength(UserStringHandle handle);
    public void GetUserStringChars(UserStringHandle handle, Span<char> buffer, out int charsWritten);

    // Or what about we add the following APIs instead of the above?
    // The memory the span points to can be freed under our feet but
    // this is an existing problem with many SRM APIs that safely wrap pointers.
    // The library is expert-level either way.
    public ReadOnlySpan<byte> GetBlobSpan(BlobHandle handle);
    public ReadOnlySpan<byte> GetRawStringBytes(StringHandle handle);
    public ReadOnlySpan<byte> GetRawStringBytes(NamespaceDefinitionHandle handle);
    public ReadOnlySpan<byte> GetRawStringBytes(DocumentNameBlobHandle handle);
    public ReadOnlySpan<byte> GetRawUserStringBytes(UserStringHandle handle);
}

namespace System.Reflection.Metadata.Ecma335;

public sealed class MetadataBuilder
{
    // These APIs will not allocate if the blob/string already exists.
    public BlobHandle GetOrAddBlob(ReadOnlySpan<byte> value);
    public BlobHandle GetOrAddBlobUTF8(ReadOnlySpan<char> value, bool allowUnpairedSurrogates = true);
    public BlobHandle GetOrAddBlobUTF16(ReadOnlySpan<char> value);
    public BlobHandle GetOrAddDocumentName(ReadOnlySpan<char> value);
    public StringHandle GetOrAddString(ReadOnlySpan<char> value);
    public UserStringHandle GetOrAddUserString(ReadOnlySpan<char> value);
}

API Usage

The proposed APIs correspond to existing APIs that work with strings and (immutable) byte arrays. Their usage will be similar.

Alternative Designs

Do nothing and either let users implement the span APIs on top of pointers if they are available, or accept the memory allocations if they aren't.

There are also more possible APIs to be spanified (such as methods in the System.Reflection.Metadata.Ecma335.***Encoders), but the proposed APIs are the most foundational; the rest can be implemented by user code on top of these with little effort.

Risks

No response

@teo-tsirpanis teo-tsirpanis added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Reflection.Metadata labels Apr 24, 2023
@ghost
Copy link

ghost commented Apr 24, 2023

Tagging subscribers to this area: @dotnet/area-system-reflection-metadata
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

System.Reflection.Metadata is a pretty perormance-oriented library but its API lacks methods that accept spans. I propose additional span overloads for methods that work with memory buffers. Some of these APIs can now be implemented efficiently thanks to the work started in #81059.

API Proposal

namespace System.Reflection.Metadata;

public class BlobBuilder
{
    public void WriteBytes(ReadOnlySpan<byte> buffer);
}

public class BlobContentId
{
    public BlobContentId(ReadOnlySpan<byte> id);
    public static BllobContentId FromHash(ReadOnlySpan<byte> hashCode);
}

public struct BlobReader
{
    public void ReadBytes(Span<byte> buffer);
    public void ReadUTF8(int byteCount, Span<char> buffer);
    public bool TryReadUTF8(int byteCount, Span<char> buffer, out int charsWritten);
    public void ReadUTF16(int byteCount, Span<char> buffer);
    public bool TryReadUTF16(int byteCount, Span<char> buffer, out int charsWritten);
}

public struct BlobWriter
{
    public void WriteBytes(ReadOnlySpan<byte> buffer);
}

public struct MetadataReader
{
    public int GetBlobLength(BlobHandle handle);
    public void GetBlobBytes(BlobHandle handle, Span<byte> buffer, out int bytesWritten);
    public int GetStringLength(StringHandle handle);
    public void GetStringBytes(StringHandle handle, Span<char> buffer, out int bytesWritten);
    public int GetStringLength(NamespaceDefinitionHandle handle);
    public void GetStringBytes(NamespaceDefinitionHandle handle, Span<char> buffer, out int bytesWritten);
    public int GetStringLength(DocumentNameBlobHandle handle);
    public void GetStringBytes(DocumentNameBlobHandle handle, Span<char> buffer, out int bytesWritten);
    public int GetUserStringLength(UserStringHandle handle);
    public void GetUserStringBytes(UserStringHandle handle, Span<char> buffer, out int bytesWritten);

    // Or what about we add the following APIs instead of the above?
    // The memory the span points to can be freed under our feet but
    // this is an existing problem with many SRM APIs that safely wrap pointers.
    // The library is expert-level either way.
    public ReadOnlySpan<byte> GetBlobSpan(BlobHandle handle);
    public ReadOnlySpan<byte> GetRawStringBytes(StringHandle handle);
    public ReadOnlySpan<byte> GetRawStringBytes(NamespaceDefinitionHandle handle);
    public ReadOnlySpan<byte> GetRawStringBytes(DocumentNameBlobHandle handle);
    public ReadOnlySpan<byte> GetRawUserStringBytes(UserStringHandle handle);
}

namespace System.Reflection.Metadata.Ecma335;

public sealed class MetadataBuilder
{
    // These APIs will not allocate if the blob/string already exists.
    public BlobHandle GetOrAddBlob(ReadOnlySpan<byte> value);
    public BlobHandle GetOrAddBlobUTF8(ReadOnlySpan<byte> value);
    public BlobHandle GetOrAddBlobUTF16(ReadOnlySpan<byte> value);
    public BlobHandle GetOrAddDocumentName(ReadOnlySpan<char> value);
    public StringHandle GetOrAddString(ReadOnlySpan<char> value);
    public UserStringHandle GetOrAddUserString(ReadOnlySpan<char> value);
}

API Usage

The proposed APIs correspond to existing APIs that work with strings and (immutable) byte arrays. Their usage will be similar.

Alternative Designs

Do nothing and either let users implement the span APIs on top of pointers if they are available, or accept the memory allocations if they aren't.

There are also more possible APIs to be spanified (such as methods in the System.Reflection.Metadata.Ecma335.***Encoders), but the proposed APIs are the most foundational; the rest can be implemented by user code on top of these with little effort.

Risks

No response

Author: teo-tsirpanis
Assignees: -
Labels:

api-suggestion, area-System.Reflection.Metadata

Milestone: -

@steveharter
Copy link
Member

I propose additional span overloads for methods that work with memory buffers

@teo-tsirpanis can you provider an example of the caller using buffers (not backed by a simple array) where these APIs are advantageous? Thanks

@teo-tsirpanis
Copy link
Contributor Author

teo-tsirpanis commented Apr 26, 2023

One use case would be #84580 (comment).

@teo-tsirpanis teo-tsirpanis added the untriaged New issue has not been triaged by the area owner label May 22, 2023
@buyaa-n
Copy link
Contributor

buyaa-n commented May 30, 2023

@teo-tsirpanis the API proposals for BlobBuilder, BlobContentId, BlobWriter and MetadataBuilder looks good to me, for others more info needed, I left some questions as a comment. In general, a usage scenarios for them would be helpful for review. Also, I would prefer to split the proposals for BlobReader and MetadataReader into another issue if that makes sense to you

public struct BlobReader
{
    public byte[] ReadBytes(int byteCount)
    public void ReadBytes(int byteCount, byte[] buffer, int bufferOffset)
+   public void ReadBytes(Span<byte> buffer);
    public string ReadUTF8(int byteCount)
+   public void ReadUTF8(int byteCount, Span<char> buffer); // Why this passes byteCount but ReadBytes(Span<byte> buffer) not?
+   public bool TryReadUTF8(int byteCount, Span<char> buffer, out int charsWritten); // A uage scenarios will be helpful
    public string ReadUTF16(int byteCount)
+   public void ReadUTF16(int byteCount, Span<char> buffer);
+   public bool TryReadUTF16(int byteCount, Span<char> buffer, out int charsWritten); // Usage scenarios will be helpful
}
public struct MetadataReader
{
+   public int GetBlobLength(BlobHandle handle); // Why this needed? I guess for determining span size, anyway a user scenario will be useful
    public byte[] GetBlobBytes(BlobHandle handle);
+   public void GetBlobBytes(BlobHandle handle, Span<byte> buffer, out int bytesWritten); // why do you need bytesWritten?
    public string GetString(StringHandle handle);
+   public int GetStringLength(StringHandle handle); // usage scenarios
+   public void GetStringBytes(StringHandle handle, Span<char> buffer, out int bytesWritten);
+   public int GetStringLength(NamespaceDefinitionHandle handle);
+   public void GetStringBytes(NamespaceDefinitionHandle handle, Span<char> buffer, out int bytesWritten);
+   public int GetStringLength(DocumentNameBlobHandle handle);
+   public void GetStringBytes(DocumentNameBlobHandle handle, Span<char> buffer, out int bytesWritten);
+   public int GetUserStringLength(UserStringHandle handle);
+   public void GetUserStringBytes(UserStringHandle handle, Span<char> buffer, out int bytesWritten);

    // Or what about we add the following APIs instead of the above?
    // The memory the span points to can be freed under our feet but
    // this is an existing problem with many SRM APIs that safely wrap pointers.
    // The library is expert-level either way.
+    public ReadOnlySpan<byte> GetBlobSpan(BlobHandle handle);
+    public ReadOnlySpan<byte> GetRawStringBytes(StringHandle handle);
+    public ReadOnlySpan<byte> GetRawStringBytes(NamespaceDefinitionHandle handle);
+    public ReadOnlySpan<byte> GetRawStringBytes(DocumentNameBlobHandle handle);
+    public ReadOnlySpan<byte> GetRawUserStringBytes(UserStringHandle handle);
}
public sealed class MetadataBuilder
{
    public BlobHandle GetOrAddBlob(byte[] value) 
    // These APIs will not allocate if the blob/string already exists.
+   public BlobHandle GetOrAddBlob(ReadOnlySpan<byte> value);
    public BlobHandle GetOrAddBlobUTF8(string value, bool allowUnpairedSurrogates = true);
+   public BlobHandle GetOrAddBlobUTF8(ReadOnlySpan<byte> value); // should it have allowUnpairedSurrogates parameter too?
    public System.Reflection.Metadata.BlobHandle GetOrAddBlobUTF16(string value);
+   public BlobHandle GetOrAddBlobUTF16(ReadOnlySpan<byte> value);
    public BlobHandle GetOrAddDocumentName(string value);
+   public BlobHandle GetOrAddDocumentName(ReadOnlySpan<char> value);
    public StringHandle GetOrAddString(string value);
+   public StringHandle GetOrAddString(ReadOnlySpan<char> value);
    public UserStringHandle GetOrAddUserString(string value);
+   public UserStringHandle GetOrAddUserString(ReadOnlySpan<char> value);
}

Adding future milestone for now, I don't think all of these APIs ready for 8.0, we could change milestone if BlobReader and MetadataReader proposals moved to an another issue/proposal

@buyaa-n buyaa-n added this to the Future milestone May 30, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label May 30, 2023
@fowl2
Copy link

fowl2 commented Oct 24, 2023

Missing the simplest one :)

public sealed class MetadataReader
{
    public unsafe byte* MetadataPointer { get; }
    public int MetadataLength { get; }
+   public ReadOnlySpan<byte> MetadataSpan { get; }
}

Not sure if this is that useful given the whole type is unsafe, but perhaps:

public unsafe struct BlobReader
{
+   public BlobReader(ReadOnlySpan<byte> buffer);
}

@PaulusParssinen
Copy link
Contributor

PaulusParssinen commented Jun 1, 2024

ILCompiler and its components could benefit from more (RO)Span<T> overloads in the S.R.M. In many places ILC has to go through various hoops to cache values that it reads from metadata because lack of modern Span APIs.

@teo-tsirpanis
Copy link
Contributor Author

@buyaa-n

public int GetBlobLength(BlobHandle handle); // Why this needed? I guess for determining span size, anyway a user scenario will be useful

My thought was that it would allow the users to get the buffer's size and prepare an appropriately sized buffer to pass to GetBlob.

public void ReadUTF8(int byteCount, Span<char> buffer); // Why this passes byteCount but ReadBytes(Span<byte> buffer) not?
public bool TryReadUTF8(int byteCount, Span<char> buffer, out int charsWritten); // A uage scenarios will be helpful

On second thought byteCount is not needed; it can be inferred from the span. I updated the proposal to fix this and other mistakes. I also am not sure if the BlobReader.TryRead*** methods are needed.

@AArnott
Copy link
Contributor

AArnott commented Jun 8, 2024

public void GetStringChars(NamespaceDefinitionHandle handle, Span<char> buffer, out int bytesWritten);

The out parameter should be named charsWritten instead. For this, and all other char-based spans with out length parameters.

@teo-tsirpanis
Copy link
Contributor Author

Updated, thanks.

@AArnott
Copy link
Contributor

AArnott commented Jun 8, 2024

IMO I prefer the single TryGetString method pattern I proposed in #103169 over two methods (GetStringLength and GetStringChars). Besides being one method instead of two, it means I can reuse a buffer for many get-string calls without fear of it throwing an exception just because the length is too short. With the two method pattern, I must make an extra GetStringLength call before trying to obtain the string just to avoid the exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Reflection.Metadata
Projects
None yet
Development

No branches or pull requests

6 participants