-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/6.0-staging] Make WindowsServiceLifetime gracefully stop #85661
[release/6.0-staging] Make WindowsServiceLifetime gracefully stop #85661
Conversation
* Make WindowsServiceLifetime gracefully stop WindowsServiceLifetime was not waiting for ServiceBase to stop the service. As a result we would sometimes end the process before notifying service control manager that the service had stopped -- resulting in an error in the eventlog and sometimes a service restart. We also were permitting multiple calls to Stop to occur - through SCM callbacks, and through public API. We must not call SetServiceStatus again once the service is marked as stopped. * Alternate approach to ensuring we only ever set STATE_STOPPED once. * Avoid calling ServiceBase.Stop on stopped service I fixed double-calling STATE_STOPPED in ServiceBase, but this fix will not be present on .NETFramework. Workaround that by avoiding calling ServiceBase.Stop when the service has already been stopped by SCM. * Add tests for WindowsServiceLifetime These tests leverage RemoteExecutor to avoid creating a separate service assembly. * Respond to feedback and add more tests. This better integrates with the RemoteExecutor component as well, by hooking up the service process and fetching its handle. This gives us the correct logging and exitcode handling from RemoteExecutor. * Honor Cancellation in StopAsync * Fix bindingRedirects in RemoteExecutor * Use Async lambdas for service testing * Fix issue on Win7 where duplicate service descriptions are disallowed * Respond to feedback * Fix comment and add timeout
# Conflicts: # src/libraries/Microsoft.Extensions.Hosting.WindowsServices/tests/UseWindowsServiceTests.cs
# Conflicts: # src/libraries/Microsoft.Extensions.Hosting.WindowsServices/tests/UseWindowsServiceTests.cs
Tagging subscribers to this area: @dotnet/area-extensions-hosting Issue DetailsBackport of #83892 DescriptionWindows services implemented with Microsoft.Extensions.Hosting.WindowsServices would not gracefully stop - due to exit without reporting STOPPED status - resulting in an error in Service Control Manager. If the service had configured error handling this would also cause the service to restart rather than stay stopped. In some cases the service would crash with on stop as well - generating WER dumps. The clean exit without reporting STOPPED was caused because nothing held up the main routine from exiting before SCM was notified that the service was stopped. This race condition was always present, but has gotten worse over the past two releases as we made the runtime faster - effectively giving the background thread less time to complete and report it's stop. The AV was caused by a double call to stop the service -- once happening when handing the SCM Stop request, and again from hosting when shutting down the application lifetime. This double call was violating the constraint of SetServiceStatus:
We've avoided calling stop twice and also guarded against user's calling ServiceBase.Stop when the service is already stopping. Customer ImpactNot all shutdown logic is allowed to run - which could result in data loss. Customers receive error reports about their services. Services may restart when intended to stop. TestingFix has been made in main for a month and validated on nightly builds by customers. We added numerous automated test cases to cover this scenario and cover the overall lifetime sequence. We also tested the service stop logic manually and through system shutdown (note that this bug still exists which causes problems on shutdown RiskRisk is low, but not zero due to the size of the change and the addition of cross-thread synchronization and locking. We've tried to mitigate this through lots of automated and manual testing. We also kept the locking to a minimum and run no user code under the added lock. Probably the biggest risk here is that we didn't completely fix everything folks care about in this path. For example: #83093 Note the changes to ServiceBase are not strictly required - which is good because we cannot change that type on .NETFramework where this same package is supported.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also LGTM. OOB package authoring changes look correct.
Backport of #83892
Fixes #62579 on 6.0
Description
Windows services implemented with Microsoft.Extensions.Hosting.WindowsServices would not gracefully stop - due to exit without reporting STOPPED status - resulting in an error in Service Control Manager. If the service had configured error handling this would also cause the service to restart rather than stay stopped. In some cases the service would crash with on stop as well - generating WER dumps.
The clean exit without reporting STOPPED was caused because nothing held up the main routine from exiting before SCM was notified that the service was stopped. This race condition was always present, but has gotten worse over the past two releases as we made the runtime faster - effectively giving the background thread less time to complete and report it's stop.
The AV was caused by a double call to stop the service -- once happening when handing the SCM Stop request, and again from hosting when shutting down the application lifetime. This double call was violating the constraint of SetServiceStatus:
We've avoided calling stop twice and also guarded against user's calling ServiceBase.Stop when the service is already stopping.
Customer Impact
Not all shutdown logic is allowed to run - which could result in data loss. Customers receive error reports about their services. Services may restart when intended to stop.
Testing
Fix has been made in main for a month and validated on nightly builds by customers. We added numerous automated test cases to cover this scenario and cover the overall lifetime sequence. We also tested the service stop logic manually and through system shutdown (note that this bug still exists which causes problems on shutdown
Risk
Risk is low, but not zero due to the size of the change and the addition of cross-thread synchronization and locking. We've tried to mitigate this through lots of automated and manual testing. We also kept the locking to a minimum and run no user code under the added lock. Probably the biggest risk here is that we didn't completely fix everything folks care about in this path. For example: #83093
Note the changes to ServiceBase are not strictly required - which is good because we cannot change that type on .NETFramework where this same package is supported.