Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add async SourceText APIs #61489

Closed
Neme12 opened this issue May 24, 2022 · 26 comments
Closed

Add async SourceText APIs #61489

Neme12 opened this issue May 24, 2022 · 26 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation Area-Compilers Concept-API This issue involves adding, removing, clarification, or modification of an API. Feature Request
Milestone

Comments

@Neme12
Copy link
Contributor

Neme12 commented May 24, 2022

Background and Motivation

When reading from a file, I usually read the data asynchronously, using methods like File.ReadAllTextAsync or creating a FileStream with the FileOptions.Asynchronous option and using the async Stream methods. Using asynchronous IO is the recommended practice in several kinds of apps, especially in desktop apps, and UWP for example initially even removed all the non-async file APIs from its version of .NET because they felt so strongly about async always being the right approach.

Roslyn's SourceText provides From methods to read the text from a Stream or a TextReader (usually representing a file) and Write methods to write the text to a TextWriter, but there are no async equivalents to this that would use the async APIs of Stream, TextReader and TextWriter, which have been available for quite some time, including on .NET Framework. This seems like an omission.

If this API is accepted, I'd be interested in making a PR.

Proposed API

 public abstract class SourceText
 {
     // New methods:
+    public static ValueTask<SourceText> FromAsync(TextReader reader, int length, Encoding? encoding = null, SourceHashAlgorithm checksumAlgorithm = SourceHashAlgorithm.Sha1, CancellationToken cancellationToken = default);
+    public static ValueTask<SourceText> FromAsync(Stream stream, Encoding? encoding = null, SourceHashAlgorithm checksumAlgorithm = SourceHashAlgorithm.Sha1, bool throwIfBinaryDetected = false, bool canBeEmbedded = false, CancellationToken cancellationToken = default);
 
     // Existing methods (except for back-compat hidden ones):
     public static SourceText From(string text, Encoding? encoding = null, SourceHashAlgorithm checksumAlgorithm = SourceHashAlgorithm.Sha1);
     public static SourceText From(TextReader reader, int length, Encoding? encoding = null, SourceHashAlgorithm checksumAlgorithm = SourceHashAlgorithm.Sha1);
     public static SourceText From(Stream stream, Encoding? encoding = null, SourceHashAlgorithm checksumAlgorithm = SourceHashAlgorithm.Sha1, bool throwIfBinaryDetected = false, bool canBeEmbedded = false);
     public static SourceText From(byte[] buffer, int length, Encoding? encoding = null, SourceHashAlgorithm checksumAlgorithm = SourceHashAlgorithm.Sha1, bool throwIfBinaryDetected = false, bool canBeEmbedded = false);
 
     // New methods:
+    public ValueTask WriteAsync(TextWriter textWriter, CancellationToken cancellationToken = default);
+    public virtual ValueTask WriteAsync(TextWriter writer, TextSpan span, CancellationToken cancellationToken = default);
 
     // Existing methods:
     public void Write(TextWriter textWriter, CancellationToken cancellationToken = default);
     public virtual void Write(TextWriter writer, TextSpan span, CancellationToken cancellationToken = default);
 }

public sealed class EmbeddedText
{
     // New methods:
+    public static async ValueTask<EmbeddedText> FromStreamAsync(string filePath, Stream stream, SourceHashAlgorithm checksumAlgorithm = SourceHashAlgorithm.Sha1, CancellationToken cancellationToken = default);

     // Existing methods:
     public static EmbeddedText FromSource(string filePath, SourceText text)
     public static EmbeddedText FromStream(string filePath, Stream stream, SourceHashAlgorithm checksumAlgorithm = SourceHashAlgorithm.Sha1);
     public static EmbeddedText FromBytes(string filePath, ArraySegment<byte> bytes, SourceHashAlgorithm checksumAlgorithm = SourceHashAlgorithm.Sha1);
}

I didn't add async versions of the string and byte[] overloads, because in those cases, the data is already in memory.

Usage Examples

SourceText sourceText;

using (var stream = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, FileOptions.Asynchronous))
    sourceText = await SourceText.FromAsync(stream);
SourceText sourceText;

using (var httpClient = new HttpClient())
    sourceText = await SourceText.FromAsync(await httpClient.GetStreamAsync(url));
using (var stream = new FileStream(path, FileMode.Create, FileAccess.Write, FileShare.Read, 4096, FileOptions.Asynchronous))
using (var writer = new StreamWriter(stream))
    await sourceText.WriteAsync(writer);

Questions

Should the APIs return ValueTask instead of Task?

I think that is a question of whether these API calls will often complete synchronously. I imagine they will complete synchronously if users provide e.g. a Stream that only supports synchronous operations and implements the asynchronous APIs as returning synchronously, but I'm not sure if that will be common. FileStream itself, if it isn't created with the FileOptions.Asynchronous option to support truly asynchronous IO, will run the async operations on a background thread. The base Stream class also implements the async methods as calling the synchronous ones on a background thread, if the async APIs aren't implemented in the derived class. The only built-in types I know of that implement the async APIs as returning synchronously are StringReader and StringWriter, and if someone is explicitly using those, using the synchronous APIs is the correct approach. Thus the question would be whether FileStream can return synchronously even with FileOptions.Asynchronous, which I don't know.
EDIT: From what I understand, it can now be beneficial to use ValueTask even when the method returns asynchronously on .NET Core and it's recommended to always use ValueTask from now on (see dotnet/runtime#33804), so I changed the proposal to use ValueTask.

Another question: see #61489 (comment)

Alternative Designs

Alternatively, users would have to do synchronous IO, like today.

Risks

A small risk is that users could use these asynchronous APIs even when they pass in a Stream, TextReader or TextWriter that doesn't support truly asynchronous operations without realizing it, and the implementation of that type could either use a background thread or run synchronously. Asynchronous operations are also usually slower than synchronous ones.

@Neme12 Neme12 added Concept-API This issue involves adding, removing, clarification, or modification of an API. Feature Request labels May 24, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead labels May 24, 2022
@CyrusNajmabadi
Copy link
Member

Do we have an example of where this is an issue today?

@jcouv jcouv added this to the Compiler.Next milestone Jun 16, 2022
@jcouv jcouv removed the untriaged Issues and PRs which have not yet been triaged by a lead label Jun 16, 2022
@TheAngryByrd
Copy link

I'm working on utilizing this in FsAutocomplete. Having this would be great as it would cut down on threadpool starvation.

@Neme12
Copy link
Contributor Author

Neme12 commented Jul 22, 2023

I added a proposal for EmbeddedText.FromStreamAsync as well.

@Neme12
Copy link
Contributor Author

Neme12 commented Jul 23, 2023

Another question is whether to add convenience APIs to read from a file directly by specifying a file path as opposed to a Stream, because to correctly read from a file asynchronously, you have to specify FileOptions.Asynchronous, e.g.:

await using (var fileStream = File.Open("file.cs", new FileStreamOptions
{
    Mode = FileMode.Open,
    Access = FileAccess.Read,
    Share = FileShare.Read,
    Options = FileOptions.Asynchronous,
}))
{
    sourceText = await SourceText.FromAsync(fileStream);
}

instead of a simple

await using (var fileStream = File.OpenRead("file.cs"))
    sourceText = await SourceText.FromAsync(fileStream);

and in my experience, virtually nobody does this correctly (and nobody I've ever met in practice). The vast majority of programmers don't even know that this option exists and would just use the latter code, assuming that it does asynchronous IO while in reality it just uses synchronous IO on a thread pool thread.

@CyrusNajmabadi
Copy link
Member

I feel like my question above wasn't addressed

@Neme12
Copy link
Contributor Author

Neme12 commented Jul 23, 2023

@CyrusNajmabadi I don't understand your question. Are you really asking why async IO is useful or, especially in certain kinds of apps, preferred? There are many resources online about that, apart from what was mentioned here.

@CyrusNajmabadi
Copy link
Member

Yes. I'm looking for real world examples where this would help. For context, in all our Roslyn testing we haven't seen this show up. So it's unclear how we'd validate this was even an improvement for real world scenarios.

Generally, perf fixes are driven by cases demonstrating a problem. We can then check any potential solutions to ensure they resolve things. Thanks! :-)

@Neme12
Copy link
Contributor Author

Neme12 commented Jul 23, 2023

@CyrusNajmabadi This isn't necessarily about perf (in fact I would assume async would be a lot slower). In a UI app, you'd want to do this to avoid blocking the UI thread and making the app unresponsive.

@CyrusNajmabadi
Copy link
Member

Another question is whether to add convenience APIs to read from a file directly by specifying a file path

If we did this, I would expect we would instead allow passing in filestreamoptions, doc'ing that we would always add 'async' to whatever was passed in.

@CyrusNajmabadi
Copy link
Member

In a UI app, you'd want to do this to avoid blocking the UI thread and making the app unresponsive.

Two of our three primary hosts are UI apps, but we haven't needed this. So this feels speculative to me.

@CyrusNajmabadi
Copy link
Member

Note: blocking the UI thread applies to everything Roslyn does. This include every semantic operation (as there can and do cause io). I'm wary about this. We've already designed things so that consumers are expected to just not run this stuff on the UI thread.

@Neme12
Copy link
Contributor Author

Neme12 commented Jul 23, 2023

We've already designed things so that consumers are expected to just not run this stuff on the UI thread.

Yes, but that only applies to Visual Studio and the language server. People can and do use Roslyn in their own apps (see the comment by @TheAngryByrd). But you're right that I don't have a specific example, I'm just used to always using async IO by default and it struck me that Roslyn doesn't support when I tried working with a SourceText.

@CyrusNajmabadi
Copy link
Member

My point is, having not been a problem for two massive ui-hosts, I'd prefer we not create an unneeded solution if the existing ones are sufficient. That also ensures all consumers are doing things the same way, which is good for not having unintended deviations start creeping in (for example, perf problems experienced only by users of this new API).

@Neme12
Copy link
Contributor Author

Neme12 commented Jul 25, 2023

Is there any API review process/team to decide whether to take this API or not?

@CyrusNajmabadi
Copy link
Member

Yes. Mark the API as 'ready to review' and we can take it. I'll walk it through teh process.

@Neme12
Copy link
Contributor Author

Neme12 commented Jul 25, 2023

@CyrusNajmabadi I can't label issues 😅

@CyrusNajmabadi
Copy link
Member

Oh! Good to know :) Would you like me to mark this as ready to take?

@Neme12
Copy link
Contributor Author

Neme12 commented Jul 26, 2023

@CyrusNajmabadi Yes please

@CyrusNajmabadi CyrusNajmabadi added the api-ready-for-review API is ready for review, it is NOT ready for implementation label Jul 26, 2023
@333fred
Copy link
Member

333fred commented Aug 3, 2023

API Review

  • What's the use case for this?
    • Need concrete users for this, particularly if there are perf reasons for adding the api.
  • This could encourage bad behavior in source generators

Conclusion: Rejected until concrete usage scenarios arise.

@333fred 333fred closed this as not planned Won't fix, can't repro, duplicate, stale Aug 3, 2023
@333fred 333fred removed the api-ready-for-review API is ready for review, it is NOT ready for implementation label Aug 3, 2023
@333fred 333fred added the api-needs-work API needs work before it is approved, it is NOT ready for implementation label Aug 3, 2023
@TheAngryByrd
Copy link

Need concrete users for this, particularly if there are perf reasons for adding the api.

What would you like me for to provide to address this? I've already mentioned the threadpool starvation issue that can happen when reading synchronously in async methods.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Aug 3, 2023

@TheAngryByrd where are you observing this threadpool starvation? We have not encountered issues/traces/etc. ourselves in our own concurrent use of this API. As such, we lack an actual demonstration of an issue, as well as a real world example we can use to validate any potential solution.

For all we know, our async impl might actually perform worse. For perf changes we need real cases to validate improvements with.

Thanks!

@Neme12
Copy link
Contributor Author

Neme12 commented Aug 7, 2023

@TheAngryByrd Do you have any data on this? Maybe I could try implementing this in my own fork and you could check if it helps in your use case?

@TheAngryByrd
Copy link

@TheAngryByrd Do you have any data on this? Maybe I could try implementing this in my own fork and you could check if it helps in your use case?

I don’t on hand, but it would probably be possible to use a small aspnet core app with multiple routes. Load a giant text file like all of Shakespeare‘a work and watch dotnet-counters. Each route would either use the current SourceText read and one for whatever async api makes sense. Then hit each route with some load tool and see if the thread pool jumps up dramatically on the non async routes.

@Neme12
Copy link
Contributor Author

Neme12 commented Aug 7, 2023

@TheAngryByrd I thought you were saying you had this thread pool starvation problem in your own project? You were talking about FsAutoComplete.

@CyrusNajmabadi
Copy link
Member

We need this to be a real problem people are hitting in the real world.

ALso, just to be clear: the compiler doesn't even use these APIs when doing its work. Instead, it uses apis like EncodedStringText.Create. And it calls those apis concurrently in code like:

                RoslynParallel.For(
                    0,
                    sourceFiles.Length,
                    UICultureUtilities.WithCurrentUICulture<int>(i =>
                    {
                        //NOTE: order of trees is important!!
                        trees[i] = ParseFile(
                            parseOptions,
                            scriptParseOptions,
                            ref hadErrors,
                            sourceFiles[i],
                            diagnosticBag,
                            out normalizedFilePaths[i]);
                    }),
                    CancellationToken.None);

I'm very wary of us just adding apis haphazardly to address hypothetical problems. We don't have hte data indicating there actually is a concern here, and the solution doesn't seem like it would actually adequately work for how we actually produce the documents.

@TheAngryByrd
Copy link

@TheAngryByrd I thought you were saying you had this thread pool starvation problem in your own project? You were talking about FsAutoComplete.

I put in many work arounds to try to reduce stress on the threadpool, it'll take some work to undo them and get data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation Area-Compilers Concept-API This issue involves adding, removing, clarification, or modification of an API. Feature Request
Projects
None yet
Development

No branches or pull requests

6 participants