-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: log improvements #2365
chore: log improvements #2365
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2365 +/- ##
==========================================
+ Coverage 34.12% 34.19% +0.06%
==========================================
Files 198 199 +1
Lines 39445 39456 +11
==========================================
+ Hits 13462 13491 +29
+ Misses 25120 25099 -21
- Partials 863 866 +3
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
backoffWithMaxRetry := backoff.WithContext(backoff.WithMaxRetries(backoff.NewExponentialBackOff(), 3), ctx) | ||
err := backoff.RetryNotify(operation, backoffWithMaxRetry, func(err error, t time.Duration) { | ||
nc.Logger.Warnf("Failed to fetch config from API with error: %v, retrying after %v", err, t) | ||
}) | ||
if err != nil { | ||
nc.Logger.Errorf("Error sending request to the server: %v", err) | ||
if ctx.Err() == nil { | ||
nc.Logger.Errorf("Error sending request to the server: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this to avoid unrelated errors on context cancellation. Plus, it doesn't retry after ctx is cancelled.
g, ctx := errgroup.WithContext(ctx) | ||
g.Go(func() (err error) { | ||
if err = admin.StartServer(ctx); err != nil && ctx.Err() == nil { | ||
if !errors.Is(err, context.Canceled) { | ||
pkgLogger.Errorf("Error in Admin server routine: %v", err) | ||
} | ||
g.Go(func() error { | ||
if err := admin.StartServer(ctx); err != nil { | ||
return fmt.Errorf("admin server routine: %w", err) | ||
} | ||
return err | ||
return nil | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to check for ctx.Err()
, since the ctx belongs to the errgroup.
ctx
will be cancelled immediately when the first error returns- if ctx is canceled errgroup is not going to propagate any errors
if err != nil && ctx.Err() == nil { | ||
pkgLogger.Error(err) | ||
if err != nil { | ||
pkgLogger.Errorf("Terminal error: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ctx
here is the errgroup error, which is cancelled when an error is returned.
Essentially this condition err != nil && ctx.Err() == nil
would never be true and thus nothing will be logged.
I find it easier to print a single "Terminal Error" as late as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we need to chose between muting valuable errors and erroneously logging errors during normal shutdown
if err != nil && ctx.Err() == nil { | ||
pkgLogger.Error(err) | ||
if err != nil { | ||
pkgLogger.Errorf("Terminal error: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the only side effect is that we will log an error during graceful shudown, which is a bit misleading, e.g.
2022-08-30T16:31:25.711+0300 ERROR rudder-server/main.go:294 Terminal error: profiler server routine: debug server: http: Server closed
2022-08-30T16:31:25.711+0300 INFO rudder-server/main.go:297 Attempting to shutdown gracefully
2022-08-30T16:31:25.711+0300 INFO rudder-server/main.go:308 Graceful terminal after 2.625274209s, with 154 go-routines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I will make sure Server closed errors are not propagated if ctx is cancelled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have addressed this issue by introducing a new function that takes care of http: Server closed
utils/httputil/server.go
Outdated
func GracefulListenAndServe(ctx context.Context, server *http.Server, shutdownTimeout ...time.Duration) error { | ||
return gracefulFunc(ctx, server, server.ListenAndServe, shutdownTimeout...) | ||
} | ||
|
||
func GracefulServe(ctx context.Context, server *http.Server, l net.Listener, shutdownTimeout ...time.Duration) error { | ||
fn := func() error { | ||
return server.Serve(l) | ||
} | ||
return gracefulFunc(ctx, server, fn, shutdownTimeout...) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I've introduced those two methods is:
- We are using graceful shutdown methods in many places
- Hard and repetitive to test shutdown functionality when it is not in isolation.
- It is easy to get the shutdown wrong; some of the implementations in our code were correct, and some were not (funny, I think I have introduced them all :D )
Example of incorrect implementation:
go func() {
<-ctx.Done()
_ = srv.Shutdown(context.Background())
}()
if err := srv.ListenAndServe(); err != nil {
return fmt.Errorf("debug server: %w", err)
}
return nil
When
Shutdown
is calledListenAndServe
returns immediately, always with an error.
Two problems:
- The shutdown is the one that blocks and the one we should be waiting for.
- An error is returned when the context gets cancelled.
Semi-correct example addresses 1st issue:
g, ctx := errgroup.WithContext(ctx)
g.Go(func() error {
<-ctx.Done()
return srv.Shutdown(context.Background())
})
g.Go(func() error {
return srv.ListenAndServe()
})
return g.Wait()
Here we wait for both go routines to return, but ListenAndServe
returns first with an error.
The new implementation fixes both issues along with providing tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me, although I'm not sure we need the Graceful
part in the name. What do you think about just having it as ListenAndServe
since we don't want an ungraceful one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, make sense :)
@@ -153,7 +153,7 @@ func closeDBConnection(handle *sql.DB) { | |||
} | |||
|
|||
func killDanglingDBConnections(db *sql.DB) { | |||
rows, err := db.Query(`SELECT PID, QUERY_START, COALESCE(WAIT_EVENT_TYPE,''), COALESCE(WAIT_EVENT, ''), STATE, QUERY, PG_TERMINATE_BACKEND(PID) | |||
rows, err := db.Query(`SELECT PID, QUERY_START, COALESCE(WAIT_EVENT_TYPE,''), COALESCE(WAIT_EVENT, ''), COALESCE(STATE, ''), QUERY, PG_TERMINATE_BACKEND(PID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got an error in my test suite that STATE
was null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with some suggestions 👍
utils/httputil/server_test.go
Outdated
require.NoError(t, err) | ||
defer l.Close() | ||
|
||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional] I think this way of enclosing brackets bothers DeepSource. Also not very useful in this case imho.
}) | ||
|
||
t.Run("timeout if connections are not closed", func(t *testing.T) { | ||
addr := fmt.Sprintf("127.0.0.1:%d", freeport.GetPort()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a testhelper to get free ports in tests. It makes sure that you don't get the same port twice in case you get multiple ones and bind them later. Not necessarily needed here but since we have one it might make sense to use it to ease refactorings etc...
https://github.com/rudderlabs/rudder-server/blob/master/testhelper/freeport.go#L14
if ctx.Err() == nil { | ||
nc.Logger.Errorf("Error sending request to the server: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change would be useful in the other backend configs as well.
utils/httputil/server.go
Outdated
case <-ctx.Done(): | ||
var ( | ||
shutdownCtx context.Context | ||
cancel context.CancelFunc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional] You don't need cancel nor to invoke it if a timeout is not specified:
var shutdownCtx context.Context
if shutdownTimeout > 0 {
var cancel func()
shutdownCtx, cancel = context.WithTimeout(context.Background(), shutdownTimeout)
defer cancel()
} else {
shutdownTimeout = context.Background()
}
Description
A couple of minor improvements were introduced regarding logs.
Motivation: rudder-server was terminating without printing a helpful log. As well as some unhelpful want/err logs when shutting down.
Notion Ticket
https://www.notion.so/rudderstacks/Minor-logs-improvements-5e9fbfc8dfa3495aa38ffa6e707585e0
Security