Skip to content

Commit

Permalink
tetra: use the builtin gRPC retry backoff mechanism
Browse files Browse the repository at this point in the history
gRGC A6 - gRPC Retry Design (a.k.a. built in backoff retry)
https://github.com/grpc/proposal/blob/master/A6-client-retries.md was
implemented by grpc/grpc-go#2111 but unusable
for a long time since maxAttempts was limited to hardcoded 5
(grpc/grpc-go#4615), recent PR fixed that
grpc/grpc-go#7229.

It's transparent to the user, to see it in action, make sure the gRPC
server is unreachable (do not start tetragon for example), run tetra
with: GRPC_GO_LOG_SEVERITY_LEVEL=warning <tetra cmd>

Note that logs don't always have the time to be pushed before exit so
output might be a bit off but the number of retries is respected (you
can debug or synchronously print in the grpc/stream.c:shouldRetry or
:withRetry to verify).

Also note that the final backoff duration is completely random and
chosen between 0 and the final duration that was computed via to the
params: https://github.com/grpc/grpc-go/blob/v1.65.0/stream.go#L702

Signed-off-by: Mahe Tardy <[email protected]>
  • Loading branch information
mtardy committed Jul 29, 2024
1 parent d9cca23 commit f5a92b6
Showing 1 changed file with 46 additions and 1 deletion.
47 changes: 46 additions & 1 deletion cmd/tetra/common/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,47 @@ import (
"google.golang.org/grpc/credentials/insecure"
)

// gRGC A6 - gRPC Retry Design (a.k.a. built in backoff retry)
// https://github.com/grpc/proposal/blob/master/A6-client-retries.md
// was implemented by https://github.com/grpc/grpc-go/pull/2111 but unusable
// for a long time since maxAttempts was limited to hardcoded 5
// (https://github.com/grpc/grpc-go/issues/4615), recent PR fixed that
// https://github.com/grpc/grpc-go/pull/7229.
//
// It's transparent to the user, to see it in action, make sure the gRPC server
// is unreachable (do not start tetragon for example), run tetra with:
// GRPC_GO_LOG_SEVERITY_LEVEL=warning <tetra cmd>
//
// Note that logs don't always have the time to be pushed before exit so output
// might be a bit off but the number of retries is respected (you can debug or
// synchronously print in the grpc/stream.c:shouldRetry or :withRetry to
// verify).
//
// Also note that the final backoff duration is completely random and chosen
// between 0 and the final duration that was computed via to the params:
// https://github.com/grpc/grpc-go/blob/v1.65.0/stream.go#L702
func retryPolicy(retries int) string {
if retries < 0 {
// gRPC should ignore the invalid retry policy but will issue a warning,
return "{}"
}
// maxAttempt includes the first call
maxAttempt := retries + 1
// let's not limit backoff by hardcoding 1h in MaxBackoff
// since we need to provide a value >0
return fmt.Sprintf(`{
"methodConfig": [{
"name": [{"service": "tetragon.FineGuidanceSensors"}],
"retryPolicy": {
"MaxAttempts": %d,
"InitialBackoff": "1s",
"MaxBackoff": "3600s",
"BackoffMultiplier": 2,
"RetryableStatusCodes": [ "UNAVAILABLE" ]
}
}]}`, maxAttempt)
}

func CliRunErr(fn func(ctx context.Context, cli tetragon.FineGuidanceSensorsClient), fnErr func(err error)) {
c, err := NewClientWithDefaultContextAndAddress()
if err != nil {
Expand Down Expand Up @@ -60,7 +101,11 @@ func NewClient(ctx context.Context, address string, timeout time.Duration) (*Cli
c.Ctx, _ = signal.NotifyContext(timeoutContext, syscall.SIGINT, syscall.SIGTERM)

var err error
c.conn, err = grpc.NewClient(address, grpc.WithTransportCredentials(insecure.NewCredentials()))
c.conn, err = grpc.NewClient(address,
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithDefaultServiceConfig(retryPolicy(Retries)),
grpc.WithMaxCallAttempts(Retries+1), // maxAttempt includes the first call
)
if err != nil {
return nil, fmt.Errorf("failed to create gRPC client with address %s: %w", address, err)
}
Expand Down

0 comments on commit f5a92b6

Please sign in to comment.