You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Actual wait time for non-canceled token was '00:00:00' instead of '00:00:01', wait result: False, using protective wait. Please report this to Hangfire developers.
#2447
Closed
ManuelHaas opened this issue
Sep 26, 2024
· 2 comments
2024-09-26 02:26:04.697 +02:00 [ERR] Actual wait time for non-canceled token was '00:00:00' instead of '00:00:01', wait result: False, using protective wait. Please report this to Hangfire developers.
2024-09-26 02:31:07.708 +02:00 [ERR] Actual wait time for non-canceled token was '00:00:00' instead of '00:00:01', wait result: False, using protective wait. Please report this to Hangfire developers.
I am reporting it here because that is how it is written in the message.
.NET 8.0.8
Hangfire 1.8.14
The text was updated successfully, but these errors were encountered:
Thank you so much for reporting this! And looks like I need help in validating my logic below, because my results don't make any sense.
Several years ago, in the early days of .NET Core, I began to suspect there are problems with waiting on CancellationToken.WaitHandle and implemented a workaround with a custom WaitHandle object in the commit 140b92f. Some month ago I removed that logic, thinking there should be no problems with that property anymore, since .NET Core is now much more stable, but decided to keep some reinsurance logic that will help to diagnose this case without causing CPU to run at 100% due to constant retries.
The following piece of code performs waiting on a cancellation logic between some retry attempts, and calls the WaitHandle.WaitOne method with a timeout value on a CancellationToken to do so with the possibility to prevent delays during shutdown. All the other logic is to ensure that WaitOne is working fine.
varstopwatch=Stopwatch.StartNew();varwaitResult=cancellationToken.WaitHandle.WaitOne(timeout);// The main linestopwatch.Stop();vartimeoutThreshold=TimeSpan.FromMilliseconds(1000);varelapsedThreshold=TimeSpan.FromMilliseconds(500);varprotectionTime=TimeSpan.FromSeconds(1);if(!cancellationToken.IsCancellationRequested&&// Checking `cancellationToken` hasn't been canceledtimeout>=timeoutThreshold&&// Checking `timeout` more than 1 secondstopwatch.Elapsed<elapsedThreshold)// Checking less that 500ms elapsed{try{varlogger=LogProvider.GetLogger(typeof(CancellationTokenExtentions));logger.Error($"Actual wait time for non-canceled token was '{stopwatch.Elapsed}' instead of '{timeout}', wait result: {waitResult}, using protective wait. Please report this to Hangfire developers.");}finally{Thread.Sleep(protectionTime);}}
From the messages above we see that the result value of WaitOne is false, and that means that CancellationToken instance wasn't canceled, e.g. the corresponding WaitHandle instance hasn't been signaled (and IsCancellationRequested check confirms this).
WaitOne was called with at least 1-second timeout value, according to the log. It is a positive value, not TimeSpan.Zero, so according to MSDN and common sense, the calling thread should be blocked until the timeout occurs.
But the message says that stopwatch.Elapsed was less than 500ms.
So it's either my logic is wrong (probably), detection logic is wrong (probably) my understanding of WaitOne method's behavior is wrong (probably), or WaitHandle.WaitOne method is wrong that's totally improbable – this method has so many usages and expected to be battle tested everywhere.
I would appreciate any help in understanding what's wrong here and in more reports of this behavior.
In version 1.8.15 I will improve the precision to know the exact timings in milliseconds, since TimeSpan.ToString precision is not enough here.
As far as I can tell, the error only occurs when the server is under heavy load or the host system of the virtual machine is causing problems.
A little over two weeks ago, the host system of the virtual machine started to cause problems.
On the instance, the error message occurred multiple times every day for 12 days. Three days ago, I moved the instance to a new server and the error message has not occurred since.
Hi,
I got the following entries in my log:
I am reporting it here because that is how it is written in the message.
.NET 8.0.8
Hangfire 1.8.14
The text was updated successfully, but these errors were encountered: