Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend ReadOnlySpan<byte> optimization for static data to work with ASCII/UTF8 strings #48850

Open
benaadams opened this issue Feb 11, 2019 · 9 comments
Labels
Area-Compilers Code Gen Quality Room for improvement in the quality of the compiler's generated code
Milestone

Comments

@benaadams
Copy link
Member

Could this (#24621) work with byte strings? As writing them out is fairly impenetrable e.g.

static ReadOnlySpan<byte> ContinueBytes =>
    new byte[] { (byte)'H', (byte)'T', (byte)'T', (byte)'P', (byte)'/', (byte)'1', (byte)'.', (byte)'1', (byte)' ', (byte)'1', (byte)'0', (byte)'0', (byte)' ', (byte)'C', (byte)'o', (byte)'n', (byte)'t', (byte)'i', (byte)'n', (byte)'u', (byte)'e', (byte)'\r', (byte)'\n', (byte)'\r', (byte)'\n' };

So it would be nice if it worked with UTF8 and/or ASCII encoding, so this worked instead (both preferably):

static ReadOnlySpan<byte> ContinueBytes =>
    Encoding.UTF8.GetBytes("HTTP/1.1 100 Continue\r\n\r\n");

Example: dotnet/aspnetcore#7422

/cc @VSadov @stephentoub @jkotas @KrzysztofCwalina @jaredpar @jcouv

@vcsjones
Copy link
Member

My personal preference would for some string literal modifier to exist like b"abc" (a la Rust) that is of type ROS<byte> and be limited to ASCII characters. Other prefixes could exist for other encodings, though I wonder how endianness would work.

@GrabYourPitchforks
Copy link
Member

This would mesh generally with the proposals we've been shooting around internally re:

Utf8String theStr = utf8"Hello world!"; // or similar

And since the current proposal is to have a free conversion from Utf8String to ROS<byte> (just like how there's a free conversion from string to ROS<char>), this should generate the desired behavior in the end.

@benaadams
Copy link
Member Author

I hear talk of something like this, which would be more terse and a bit better:

static ReadOnlySpan<byte> ContinueBytes => u8"HTTP/1.1 100 Continue\r\n\r\n";

Working via implicit Utf8String => ReadOnlySpan<byte> conversion (and hopefully the Utf8String using the same load from data as ReadOnlySpan<byte>?)

/cc @GrabYourPitchforks

@Thaina
Copy link

Thaina commented Feb 12, 2019

Isn't string internally char and should be ReadOnlySpan<char> ?

@benaadams
Copy link
Member Author

benaadams commented Feb 12, 2019

Not 8 bit string data (i.e. ASCII and UTF8); they are a list of bytes, so ReadOnlySpan<byte>

UTF16 strings sort of work https://github.com/dotnet/coreclr/issues/22511 with

static ReadOnlySpan<char> Hello => "Hello".AsSpan();

But if you want to do 8 bit you have to do:

// "HTTP/1.1 100 Continue\r\n\r\n"
static ReadOnlySpan<byte> ContinueBytes => new byte[] { (byte)'H', (byte)'T', (byte)'T', (byte)'P', (byte)'/', (byte)'1', (byte)'.', (byte)'1', (byte)' ', (byte)'1', (byte)'0', (byte)'0', (byte)' ', (byte)'C', (byte)'o', (byte)'n', (byte)'t', (byte)'i', (byte)'n', (byte)'u', (byte)'e', (byte)'\r', (byte)'\n', (byte)'\r', (byte)'\n' };

@VSadov
Copy link
Member

VSadov commented Feb 12, 2019

It feels like having to write new byte[] { (byte)'H', (byte)'T', (byte)'T', (byte)'P', ... should be fixed first. Looks gross.

Once there is a utf8 literal (or a wellknown/intrinsic conversion from string), hooking up the optimization should be fairly easy.

@scalablecory
Copy link

I think string literals are good special cases to have.

Long term I would love something like C++'s constexpr to generalize things like this.

@ufcpp
Copy link
Contributor

ufcpp commented Jun 10, 2020

According to LDM Sept. 16, 2019, UTF-8 string literals are emitted as UTF-16, at least in the initial implementation.
So I decided to create a source generator for this purpose.

@CyrusNajmabadi CyrusNajmabadi transferred this issue from dotnet/csharplang Oct 22, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Oct 22, 2020
@jaredpar jaredpar added Code Gen Quality Room for improvement in the quality of the compiler's generated code and removed untriaged Issues and PRs which have not yet been triaged by a lead labels Oct 26, 2020
@jaredpar jaredpar added this to the Compiler.Next milestone Oct 26, 2020
@jaredpar jaredpar modified the milestones: Compiler.Next, Backlog Sep 12, 2023
@alrz
Copy link
Member

alrz commented Nov 2, 2023

I think uft8 strings addressed this already?

What I was looking for was to how embed some binary data as part of code generation, similar to https://github.com/protocolbuffers/protobuf/blob/312986896dafdbf2475601be8fa0a2faefc40b2f/csharp/src/Google.Protobuf/WellKnownTypes/Wrappers.pb.cs#L25

Currently that can be done using ROS<byte> data = "\x000a\x000b"u8 etc but this would be a lot longer than base64 encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Code Gen Quality Room for improvement in the quality of the compiler's generated code
Projects
None yet
Development

No branches or pull requests

10 participants