-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.NET console logging uses a blocking collection which is not optimal for cloud-native applications #65264
Comments
Tagging subscribers to this area: @dotnet/area-extensions-logging Issue DetailsConsole logger not quick enough per request to log for real world production applications. Refer to aspnet/Logging#515 (comment) Console logging happens on a separate thread and once message queue is full (1000 messages) console logger could start blocking the application. runtime/src/libraries/Microsoft.Extensions.Logging.Console/src/ConsoleLoggerProcessor.cs Lines 14 to 16 in 78c6505
To improve this design, we could consider using a Channel rather than a blocking collection (refer to https://devblogs.microsoft.com/dotnet/an-introduction-to-system-threading-channels/). This would prevent something that would either block app or cause out of memory exception. With upside with the current blocking collection approach is that the order of logs gets respected but since the logging methods can be accompanied with timestamp, the channel approach may still be an improved approach overall to help with the goal of having a console logger that is performant enough for real world applications.
|
BlockingCollection on my machine can accept 12 millions items a sec. |
Right, it's the size of the queue that ends up being the problem for systems that are logging ALOT 😄 |
How does switching to a Channel help? Is it simply because it can be invoked async? Or because you're imagining it could provide back pressure and the system can drop lower priority events for a while? |
I don't think the memory is the problem here. The blocking collection is throttling the application because the consumer part isn't able to keep up with the producer if too much is being logged at the at the same time and thus blocking the logging itself.
The collection can accept as many items as your machine is able to consume at the same time. Here the
No back pressure here, anything that would cause blocking would have the same problem it has right now. |
I'm not clear on the fundamental problem / goal here. In any producer/consumer system, if the consumer can't keep up with the producer, you have a few escape valves:
If the current system is applying backpressure via BlockingCollection and the goal is just to enable that backpressure to be applied synchronously, yeah, channels is likely a good answer. If the problem is that BlockingCollection has too much overhead and we still want blocking synchronously, channels might help, or we could fix BC, or we could devise a lighterweight handoff mechanism. If the problem is that the current bound is too low, raise the bound. But if the problem is that we don't want to slow down the producer and we don't want to buffer as much as we currently are, well, the remaining options are make the consumer faster or drop data (or similarly make the data smaller if the overall size is the concern). Adding more consumers in the case of writing to the console isn't really an option, since there's only one console, unless we fake a console and redirect some output to another storage mechanism. So... what's the actual problem and goal here? |
The logging API is synchronous, so we will need to block when the queue limit is reached, that's the constraint. The problem is that systems that log ALOT (probably too much) hit this global 1024 queue limit, it starts blocking threads (probably thread pool threads) and that limit isn't configurable (people seem to have lots of memory and 1024 is hindsight was maybe too low for systems that log alot). I think the simple fix is to allow tweaking the limit. Whether we change to use a channel or something more manual I think is another issue. I don't think we want to drop log messages. There's no API to help producers determine which logs are droppable and which ones aren't so blocking I think is the ideal solution. We should also speed up the consumer (if there's any room to do so). I'm not sure if there's low hanging fruit. |
@maryamariyan what was your reasoning for suggesting channels here? |
As a user, I wish to have control over the maximum queued messages. 1024 limit is applicable to most users I guess but still, users should be able to configure this based on their machine/server. No matter how we solve this problem using Channels or anything else, at some point in the future, this is going to lag because of the hardcoded cap of |
I think the current approach as-is should be fine and robust for low load scenarios but could cause latency issues with high load multi threaded requests. Seems like because of this in many cases console logging would be disabled for these high load logging heavy applications.
I'd imagine the console logger to behave as-is by default and we could make it configurable for scenarios in which there is high load of logging in cloud workloads, to allow the option to drop logs rather than block. Please correct me if I'm wrong but I assumed using
I guess increasing the queue size would be another option but it could run the risk of running out of memory. |
The problem is random log messages will be dropped. There's no way to opt in at the callsite. The other way I could maybe see log dropping working is if there was a filter or callback to determine if a log should be dropped or not. Other than that, I don't see how that's workable, but I'm open to hearing if that's acceptable. |
@maryamariyan do we know if the reported latency from the original issue is happening on Windows or Unix, or both? I peeked around the Unix implementation and I can definitely see a few places where there are as @davidfowl put it - "low hanging fruits" for optimization. Would it be beneficial if we try and see if performance of the consumer part can be significantly sped up before introducing any configuration options or even changing the implementation? |
I agree it would be a good next step to go through some perf analysis.
That makes sense. It seems unacceptable in general to have random log messages be dropped. @timmydo does the discussion so far properly cover what you had in mind as the original issue? |
I think the context on the original request would be that:
So, the feature ask would be to have a flag where |
To align the load the logger could write synchronously to a temp file and a task could read from the file and write to Console. |
Agree, I rather miss some logs than having a workload grinding to a halt because it logs way to much (I have yet to see this in reality). If stdout is too slow, any backend processing those logs would probably have issues keeping up as well. If prioritizing logs (e.g. based on level) is cheap an easy I will take it, but having a metric would suffice.
That would triple log IO and back pressure would still need to be handled (just a bit later). It would also require setting up a volume mount. We already have to set |
We circled this back with @dotnet/area-extensions-logging and we believe that dropping messages in not something we want the built-in logger to do. It is possible for developers to write their own logger that drops messages if needed and so this doesn't seem like a blocker issue. @timmydo @ShreyasJejurkar would you configure the queue limit to higher than 1024 for your application if it was configurable? |
No, I wouldn't change the queue size. I would just want to drop messages. I'm not talking about making it the default behavior. It's too bad the .NET logging system can't be "web scale" out of the box. |
So you would be OK with dropping random logs (with no control of which ones) as an optional feature?
I'm not sure why you think that. Logging to the console isn't "web scale". |
Yes, I'd be fine dropping random logs (so long as it's because of a buffer being full) as an optional feature.
Sorry, that probably wasn't the right word to use. Many container workloads, like it or not, use console logging by default. Having a console logger that blocks under load is not desirable for many web applications. Those statements are at least something we could agree on? I think the part where we probably don't agree would be about how much the standard library should provide. Do I need to use a separate library to make the console output show timestamps? Do I need a separate library so that my app doesn't block when trying to log? |
I think its reasonable to drop logs as long as it isn't the default. The queue size and behavior should be configurable (drop vs block). Applications that use the console, and log to a point where the logging provider chokes can't perform well under load. It's a simple producer consumer problem and the producer is writing too many logs (so much that you're willing to throw them away).
You should definitely look at using Serilog or another popular logging provider that is more feature rich for more advanced features. PS: I'll do some research looking at what if other ecosystems have performance issues with console logging and if they are dropping logs. |
In practice I've noticed logs are one of those things you need to constantly tune if you want the right balance. Often times someone adds a log statement, and they don't know how many times it will write. Currently we just disable console everywhere except on the dev machine due to performance issues, and it would be nice if that wasn't the case.
I've used Serilog before and didn't have any issues with it, but in general I have a lot of trust in the standard library when it comes to security and performance. When the standard library doesn't have the feature a dev is looking for, a developer working on a project may pick up something else which isn't as strongly vetted... Insert some story about critical infra and log4j-like CVEs...What I'm trying to say is that for something as foundational as logging, it's nice it have it built-in...
Thanks. Would love to hear what you find. I might take a look because I'm curious also. |
We disable the console logger as well in our performance runs 😄.
I hear you but we can't build everything into the framework, that's no way to build an ecosystems (we've already seen the effects of this). However, this is a reasonable request and should be built into the console logger (and potentially any other logger that does this async work). |
If you aren't already, consider batching writes to the console Eg., on a 5 second timer tick, write to console everything from the last 5 seconds as a single call to the console API. In past projects I have seen improvements with this, at least on Windows where the console is pretty slow. |
I think the cloud native scenario is mostly a linux one. We should measure both windows and linux performance of course but I'm not sure they are comparable at the moment. |
@davidfowl and I did a little profiling this evening. My earlier benchmarks were bottlenecked on how fast powershell could redirect the output to a file and it was very sensitive to newlines in the output which the ILogger format unintentionally had more of. Switching to cmd.exe piped to nul shows: Console.WriteLine(text): 684MB/s The high overhead for AddConsole comes from the coloring behavior, specifically the calls to kernelbase!GetConsoleScreenBufferInfo() within ConsolePal.Set_ForegroundColor and ConsolePal.Set_BackgroundColor. Setting DisableColors=true gets the same perf as AddSimpleConsole(). If we want to optimize in this area we'll need to be careful defining the setup (OS, container, shell, redirection mechanism, content being logged, etc) to make sure we are identifying the same bottlenecks. The problem in my powershell repro was a completely different issue than coloring issue we found later. |
David, I agree entirely with the approach that you propose in this comment. Bump up the current default substantially, make it tunable, (consider whether it should be count or bytes), and then optionally allow dropped logs (ideally with some way to detect it happened). Definitely wouldn't want log dropping to be the default, personally. |
Not too long ago I was playing around with writing some re-usable base types that had the discussed option for block vs drop
Handling the current console logger behavior of 'block the producer when boundary size is reached' using a channel was a pain to achieve using channels. Since there aren't any sync Another drawback of using the existing channel types under the hood was that it required either
If you don't need to worry about the blocking scenario at all, then this scenario was easy to implement using a channel under the hood. The sync logging calls can use the And if no boundary is configured then I did manage to create an implementation using mostly out-of-the-box producer/consumer collection code by creating a wrapper type that:
TL;DR I think the the custom/lighter weight hand-off code is what will be needed under the hood; particularly if the behavior needs to keep configurable option support for blocking the producers. |
Here's a summary for the latest developments with investigation made for this issue: This gist summarizes some trial and error experiments done to evaluate console logger against Console.WriteLine and in different scenarios. We couldn't prove that console logging is slower than console write line. Instead what we want to focus on for this issue is to make sure we don't get a sudden drop in RPS as soon as we hit the max queue size threshold limit. We could add two configurations improvements to console logger: (1) to allow for both dropping logs and (2) making buffer size configurable. Configuring buffer sizeGiven that the queue size today is limited for the blocking collection below: The easiest way is to just expose the queue length to make it configurable. But instead the main proposal here is to keep track of how many bytes each item in the queue is collectively carrying to decide what the desired maximum buffer size for the blocking collection should be. Dropping logsDropping logs would not be allowed by default. The way dropping of logs would work is "if enabled", then when blocking collection is full, the the console logger would throw the message away. API ProposalSince runtime/src/libraries/Microsoft.Extensions.Logging.Console/src/ConsoleLoggerProvider.cs Lines 41 to 62 in 42d1644
namespace Microsoft.Extensions.Logging.Console
{
/// <summary>
/// Options for a <see cref="ConsoleLogger"/>.
/// </summary>
public class ConsoleLoggerOptions
{
public bool DisableColors { get; set; }
public ConsoleLoggerFormat Format { get; set; }
public string? FormatterName { get; set; }
public bool IncludeScopes { get; set; }
public LogLevel LogToStandardErrorThreshold { get; set; } = LogLevel.None;
public string? TimestampFormat { get; set; }
public bool UseUtcTimestamp { get; set; }
+ /// <summary>
+ /// Gets or sets the desired console logger behavior when queue becomes full. Defaults to <c>Wait</c>.
+ /// </summary>
+ public ConsoleLoggerBufferFullMode BufferFullMode { get; set; }
+ /// <summary>
+ /// Gets or sets the maximum buffer size before the queue becomes full. Defaults to <c>1MB</c>.
+ /// </summary>
+ public int MaxBufferSizeInBytes { get; set; } = 1048576;
}
+ /// <summary>
+ /// Determines the console logger behaviour when queue becomes full.
+ /// </summary>
+ public enum ConsoleLoggerBufferFullMode
+ {
+ /// <summary>
+ /// Blocks the console thread once the queue limit is reached
+ /// </summary>
+ Wait,
+ /// <summary>
+ /// Drops new log messages when the queue is full
+ /// </summary>
+ DropNewest,
+ }
} Alternative DesignAlternatively, - public int MaxBufferSizeInBytes { get; set; } = 1048576;
+ public int? MaxBufferSizeInBytes { get; set; } Questions with design
Usage samplesusing var loggerFactory = LoggerFactory.Create(builder =>
{
builder
.AddConsole(o => o.QueueFullMode = ConsoleLoggerQueueFullMode.DropNewest)
.AddSimpleConsole(o => o.ColorBehavior = LoggerColorBehavior.Disabled);
}); |
|
I'm not a fan of making such a large change to the cache size based on the anecdotal evidence I am aware of so far. I'd propose we leave the cache size limit at default 400KB and let the community experiment with changing the limit for their individual projects using the new API. If we see strong feedback that raising the limit was beneficial for many projects then we'll be in a much better place to adjust the default. |
This seems reasonable in principle - does it make the implementation substantially harder? Small suggestion on the naming, we might want to use 'Buffer' instead of 'Queue' or 'QueueBuffer'. Ie: However if the review crew decides they like 'Queue' better go for it, its just a small preference on my part. |
Lets split int the middle and do 1MB for the buffer and lets add a counter so that customers can observe their buffer size and set values if they run into issues in their environment. |
Updated proposal.
Yes. It might be OK, I'll need to prototype the implementation that supports updating buffer size on reload. Link to current logic: |
namespace Microsoft.Extensions.Logging.Console
{
/// <summary>
/// Options for a <see cref="ConsoleLogger"/>.
/// </summary>
public partial class ConsoleLoggerOptions
{
/// <summary>
/// Gets or sets the desired console logger behavior when queue becomes full.
/// Defaults to <see cref="ConsoleLoggerQueueFullMode.Wait"/>.
/// </summary>
public ConsoleLoggerQueueFullMode QueueFullMode { get; set; }
/// <summary>
/// Gets or sets the maximum number of enqueued messages.
/// </summary>
public int MaxQueueLength { get; set; } = SomeDefault;
}
/// <summary>
/// Determines the console logger behaviour when queue becomes full.
/// </summary>
public enum ConsoleLoggerQueueFullMode
{
/// <summary>
/// Blocks the console thread once the queue limit is reached
/// </summary>
Wait,
/// <summary>
/// Drops new log messages when the queue is full
/// </summary>
DropNewest,
}
} |
We should Edit: Clarified that I meant writing a line to stdout instead of sending yet another message through the logging system. It's better if this message doesn't look like a normal log so it stands out more. |
Should this be renamed to Lines 11 to 16 in 6ca8c9b
I think it would be confusing for Channels' |
Yeah, I thought about suggesting that. Then wondered how pervasive Channels are, and if they're the odd one out there. I'd support changing it to DropAdd/DropWrite/whatever. |
+1 for using DropWrite 👍 |
@timmydo this went in for the next preview. hopefully you could get to test it out and help with feedback. |
With the current design of console logging, there is a dedicated thread that writes outputs to console and once message queue is full (with 1024 messages in the queue) the console logger starts blocking the application.
This gist summarizes some trial and error experiments done to evaluate console logger against Console.WriteLine and in different scenarios. We couldn't prove that console logging is slower than console write line. Instead what we want to focus on for this issue is to make sure we don't get a sudden drop in RPS as soon as we hit the max queue size threshold limit.
We could add two configurations improvements to console logger: (1) to allow for both dropping logs and (2) making buffer size configurable.
Configuring buffer size
Given that the queue size today is limited for the blocking collection below:
https://source.dot.net/#Microsoft.Extensions.Logging.Console/ConsoleLoggerProcessor.cs,15
The easiest way is to just expose the queue length to make it configurable.
But instead the main proposal here is to keep track of how many bytes each item in the queue is collectively carrying to decide what the desired maximum buffer size for the blocking collection should be.
Dropping logs
Dropping logs would not be allowed by default. The way dropping of logs would work is "if enabled", then when blocking collection is full, the the console logger would throw the message away.
API Proposal
Since
ConsoleLoggerProvider
instantiatesConsoleLoggerProcessor
, it makes sense to add these configurable options toConsoleLoggerOptions
which is inhected intoConsoleLoggerProvider
:runtime/src/libraries/Microsoft.Extensions.Logging.Console/src/ConsoleLoggerProvider.cs
Lines 41 to 62 in 42d1644
Alternative Design
Alternatively,
MaxQueueBufferSizeInBytes
could be nullable and when not specified use an unbounded queue:Questions with design
MaxQueueBufferSizeInBytes
to be reconfigurable on reload?Usage samples
Original Issue
Console logger not quick enough per request to log for real world production applications. Refer to aspnet/Logging#515 (comment)
With the current design console logging happens on a separate thread and once message queue is full (1000 messages) console logger could start blocking the application.
runtime/src/libraries/Microsoft.Extensions.Logging.Console/src/ConsoleLoggerProcessor.cs
Lines 14 to 16 in 78c6505
To improve this design, we could consider using a Channel rather than a blocking collection (refer to https://devblogs.microsoft.com/dotnet/an-introduction-to-system-threading-channels/). This would prevent something that would either block the app due to increased latency or cause out of memory exception.
With upside with the current blocking collection approach is that the order of logs gets respected but since the logging methods can be accompanied with timestamp, the channel approach may still be an improved approach overall to help with the goal of having a console logger that is performant enough for real world applications.
The text was updated successfully, but these errors were encountered: