Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support graceful shutdown on sigint/sigterm #385

Merged
merged 15 commits into from
Sep 12, 2022

Conversation

hongalex
Copy link
Collaborator

@hongalex hongalex commented Sep 7, 2022

What type of PR is this?
/kind feat

What this PR does / Why we need it:
Allows grpc server to gracefully shutdown when encountering interruption/termination signals.

Which issue(s) this PR fixes:

Closes #379

Special notes for your reviewer:

@hongalex hongalex requested a review from yuryu September 7, 2022 21:39
@hongalex hongalex requested a review from a user September 7, 2022 23:04
time.Sleep(2 * time.Second)
syscall.Kill(syscall.Getpid(), syscall.SIGINT)
}()
if err := Run(ctx, "tcp", serviceConfig); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's feasible to test the status change from SERVING to NOT_SERVING? Looks like GracefulStop doesn't return until all RPC calls are served, so timing wise it's a bit tricky.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could, but I added the test to TestOpenSaves_HealthCheck

}
healthcheck.SetServingStatus(serviceName, healthgrpc.HealthCheckResponse_NOT_SERVING)
s.GracefulStop()
log.Infoln("stopping open saves server gracefully")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log output should be between L72-73.
Log ("stopping") -> change status -> gracefulstop -> Log ("Stopped")

@@ -99,6 +112,31 @@ func TestOpenSaves_HealthCheck(t *testing.T) {
if want := healthgrpc.HealthCheckResponse_SERVING; got.Status != want {
t.Errorf("hc.Check got: %v, want: %v", got.Status, want)
}
cancel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to cancel the context? Do we want to test the behavior of the server after cancel()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment here to address this. I also added tests after this to test NOT_SERVING.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

}

func TestOpenSaves_RunServer(t *testing.T) {
serviceConfig, err := getServiceConfig()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.Parallel()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to run these tests in parallel since they use the same port. The timeouts are short enough that there shouldn't be much delay here anyway. Same response for the remaining tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah that makes sense, thanks

}

func TestOpenSaves_HealthCheck(t *testing.T) {
serviceConfig, err := getServiceConfig()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.Parallel()?

}
ctx := context.Background()
t.Run("cancel_context", func(t *testing.T) {
ctx, cancel := context.WithTimeout(ctx, 2*time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.Parallel()?

}
})
t.Run("sigint", func(t *testing.T) {
go func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.Parallel()?

select {
case s := <-sigs:
log.Infof("received signal: %v\n", s)
case <-ctx.Done():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear when you need to cancel the context. Currently it's only in the tests, but do you have any plan to support cancellation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm supporting this here since we pass in a ctx to Run. There's no immediate plan to support this. If you prefer, I can remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

Copy link
Member

@yuryu yuryu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -99,6 +112,31 @@ func TestOpenSaves_HealthCheck(t *testing.T) {
if want := healthgrpc.HealthCheckResponse_SERVING; got.Status != want {
t.Errorf("hc.Check got: %v, want: %v", got.Status, want)
}
cancel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

}

func TestOpenSaves_RunServer(t *testing.T) {
serviceConfig, err := getServiceConfig()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah that makes sense, thanks

select {
case s := <-sigs:
log.Infof("received signal: %v\n", s)
case <-ctx.Done():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

@hongalex hongalex enabled auto-merge (squash) September 12, 2022 17:50
@hongalex hongalex merged commit 3371ee9 into googleforgames:main Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for graceful server shutdown to address potential issues when pods are scaling down / terminating
2 participants