Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/6.0-staging] Make WindowsServiceLifetime gracefully stop #85661

Merged

Conversation

ericstj
Copy link
Member

@ericstj ericstj commented May 2, 2023

Backport of #83892
Fixes #62579 on 6.0

Description

Windows services implemented with Microsoft.Extensions.Hosting.WindowsServices would not gracefully stop - due to exit without reporting STOPPED status - resulting in an error in Service Control Manager. If the service had configured error handling this would also cause the service to restart rather than stay stopped. In some cases the service would crash with on stop as well - generating WER dumps.

The clean exit without reporting STOPPED was caused because nothing held up the main routine from exiting before SCM was notified that the service was stopped. This race condition was always present, but has gotten worse over the past two releases as we made the runtime faster - effectively giving the background thread less time to complete and report it's stop.

The AV was caused by a double call to stop the service -- once happening when handing the SCM Stop request, and again from hosting when shutting down the application lifetime. This double call was violating the constraint of SetServiceStatus:

The first call to the function in the SERVICE_STOPPED state closes the RPC context handle and any subsequent calls can cause the process to crash.

We've avoided calling stop twice and also guarded against user's calling ServiceBase.Stop when the service is already stopping.

Customer Impact

Not all shutdown logic is allowed to run - which could result in data loss. Customers receive error reports about their services. Services may restart when intended to stop.

Testing

Fix has been made in main for a month and validated on nightly builds by customers. We added numerous automated test cases to cover this scenario and cover the overall lifetime sequence. We also tested the service stop logic manually and through system shutdown (note that this bug still exists which causes problems on shutdown

Risk

Risk is low, but not zero due to the size of the change and the addition of cross-thread synchronization and locking. We've tried to mitigate this through lots of automated and manual testing. We also kept the locking to a minimum and run no user code under the added lock. Probably the biggest risk here is that we didn't completely fix everything folks care about in this path. For example: #83093

Note the changes to ServiceBase are not strictly required - which is good because we cannot change that type on .NETFramework where this same package is supported.

ericstj and others added 7 commits May 2, 2023 07:41
* Make WindowsServiceLifetime gracefully stop

WindowsServiceLifetime was not waiting for ServiceBase to stop the service.  As a result
we would sometimes end the process before notifying service control manager that the service
had stopped -- resulting in an error in the eventlog and sometimes a service restart.

We also were permitting multiple calls to Stop to occur - through SCM callbacks, and through
public API.  We must not call SetServiceStatus again once the service is marked as stopped.

* Alternate approach to ensuring we only ever set STATE_STOPPED once.

* Avoid calling ServiceBase.Stop on stopped service

I fixed double-calling STATE_STOPPED in ServiceBase, but this fix will
not be present on .NETFramework.  Workaround that by avoiding calling
ServiceBase.Stop when the service has already been stopped by SCM.

* Add tests for WindowsServiceLifetime

These tests leverage RemoteExecutor to avoid creating a separate service
assembly.

* Respond to feedback and add more tests.

This better integrates with the RemoteExecutor component as well,
by hooking up the service process and fetching its handle.

This gives us the correct logging and exitcode handling from
RemoteExecutor.

* Honor Cancellation in StopAsync

* Fix bindingRedirects in RemoteExecutor

* Use Async lambdas for service testing

* Fix issue on Win7 where duplicate service descriptions are disallowed

* Respond to feedback

* Fix comment and add timeout
# Conflicts:
#	src/libraries/Microsoft.Extensions.Hosting.WindowsServices/tests/UseWindowsServiceTests.cs
# Conflicts:
#	src/libraries/Microsoft.Extensions.Hosting.WindowsServices/tests/UseWindowsServiceTests.cs
@ericstj ericstj requested review from carlossanlop, ViktorHofer, buyaa-n and a team May 2, 2023 17:12
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 2, 2023
@ghost ghost assigned ericstj May 2, 2023
@ericstj ericstj added Servicing-consider Issue for next servicing release review area-Extensions-Hosting and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 2, 2023
@ghost
Copy link

ghost commented May 2, 2023

Tagging subscribers to this area: @dotnet/area-extensions-hosting
See info in area-owners.md if you want to be subscribed.

Issue Details

Backport of #83892
Fixes #62579 on 6.0

Description

Windows services implemented with Microsoft.Extensions.Hosting.WindowsServices would not gracefully stop - due to exit without reporting STOPPED status - resulting in an error in Service Control Manager. If the service had configured error handling this would also cause the service to restart rather than stay stopped. In some cases the service would crash with on stop as well - generating WER dumps.

The clean exit without reporting STOPPED was caused because nothing held up the main routine from exiting before SCM was notified that the service was stopped. This race condition was always present, but has gotten worse over the past two releases as we made the runtime faster - effectively giving the background thread less time to complete and report it's stop.

The AV was caused by a double call to stop the service -- once happening when handing the SCM Stop request, and again from hosting when shutting down the application lifetime. This double call was violating the constraint of SetServiceStatus:

The first call to the function in the SERVICE_STOPPED state closes the RPC context handle and any subsequent calls can cause the process to crash.

We've avoided calling stop twice and also guarded against user's calling ServiceBase.Stop when the service is already stopping.

Customer Impact

Not all shutdown logic is allowed to run - which could result in data loss. Customers receive error reports about their services. Services may restart when intended to stop.

Testing

Fix has been made in main for a month and validated on nightly builds by customers. We added numerous automated test cases to cover this scenario and cover the overall lifetime sequence. We also tested the service stop logic manually and through system shutdown (note that this bug still exists which causes problems on shutdown

Risk

Risk is low, but not zero due to the size of the change and the addition of cross-thread synchronization and locking. We've tried to mitigate this through lots of automated and manual testing. We also kept the locking to a minimum and run no user code under the added lock. Probably the biggest risk here is that we didn't completely fix everything folks care about in this path. For example: #83093

Note the changes to ServiceBase are not strictly required - which is good because we cannot change that type on .NETFramework where this same package is supported.

Author: ericstj
Assignees: ericstj
Labels:

Servicing-consider, area-Extensions-Hosting

Milestone: -

@rbhanda rbhanda added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels May 2, 2023
@rbhanda rbhanda added this to the 6.0.18 milestone May 2, 2023
Copy link
Member

@carlossanlop carlossanlop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also LGTM. OOB package authoring changes look correct.

@ericstj ericstj merged commit d628263 into dotnet:release/6.0-staging May 4, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jun 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Extensions-Hosting Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants