graphql, node, rpc: improve HTTP write timeout handling #25457

s1na · 2022-08-01T12:50:56Z

A work-around for golang/go#47229. Trigger timeout a bit before stdlib closes the connection so users see a proper error message. Note I still kept the WriteTimeout of the http.Server.

I'm not sure what happens if the method handler and the timeout statement both write at the same time

Fixes #21430

s1na · 2022-08-09T15:24:58Z

I implemented a different approach we roughly discussed with @fjl:

the rpc handler will run the method in a goroutine now. It waits for that to finish OR the context deadline to expire, in which case returns an error. The method will be running in background until it stops (possible leak). Therefore it's important that long-running methods respect context deadline. I made sure this happens in getLogs which received most of the tickets of this issue.

The timeout is being set in the outer layer (node/rpcstack). Alternatively we can pass it through to handler for it to set the timeout.

Note: The error type I added feels awkward between official json-rpc errors.

s1na · 2022-08-09T15:26:08Z

node/rpcstack.go

@@ -198,6 +198,9 @@ func (h *httpServer) ServeHTTP(w http.ResponseWriter, r *http.Request) {
 	// if http-rpc is enabled, try to serve request
 	rpc := h.httpHandler.Load().(*rpcHandler)
 	if rpc != nil {
+		ctx, cancel := context.WithTimeout(r.Context(), h.timeouts.ReadTimeout-(50*time.Millisecond))


Also this here. 50 ms is totally a random number. I think lower numbers like 20ms should also be fine. Important thing is the timeout hits before the stdlib's http library kills the connection.

rpc/handler.go

holiman · 2022-08-30T16:52:04Z

This seems to work well. With the test-method like this:

diff --git a/internal/ethapi/api.go b/internal/ethapi/api.go
index 90322033b9..b90ac5048d 100644
--- a/internal/ethapi/api.go
+++ b/internal/ethapi/api.go
@@ -2035,3 +2035,10 @@ func toHexSlice(b [][]byte) []string {
 	}
 	return r
 }
+
+func (api *DebugAPI) TimeMeOut(ctx context.Context, seconds uint64) (string, error) {
+	log.Info("TimeMeOut sleeping...", "seconds", seconds)
+	time.Sleep(time.Second * time.Duration(seconds))
+	log.Info("TimeMeOut waking up!")
+	return "Oll korrekt!", nil
+}

The 25s works fine, but 35s times out

[user@work go-ethereum]$ curl --data '{"method":"debug_timeMeOut","params":[25],"id":1,"jsonrpc":"2.0"}' -H "Content-Type: application/json" -X POST localhost:8545
{"jsonrpc":"2.0","id":1,"result":"Oll korrekt!"}
[user@work go-ethereum]$ curl --data '{"method":"debug_timeMeOut","params":[35],"id":1,"jsonrpc":"2.0"}' -H "Content-Type: application/json" -X POST localhost:8545
{"jsonrpc":"2.0","id":1,"error":{"code":408,"message":"request timed out"}}

And the timeout triggers after 30s:

INFO [08-30|18:47:55.105] TimeMeOut sleeping...                    seconds=35
WARN [08-30|18:48:25.056] Served debug_timeMeOut                   conn=127.0.0.1:44346 reqid=1 duration=29.950591423s err="request timed out"
INFO [08-30|18:48:30.106] TimeMeOut waking up!

fjl

What I don't like about this PR, is that the RPC handler will keep running in case of timeout, because it executes the RPC call on a background goroutine now.

It would be nicer to keep this out and rely on the individual RPC handlers to exit promptly when the context is canceled.

fjl · 2022-08-31T17:42:26Z

rpc/handler.go

@@ -334,8 +334,19 @@ func (h *handler) handleCall(cp *callProc, msg *jsonrpcMessage) *jsonrpcMessage
 		return msg.errorResponse(&invalidParamsError{err.Error()})
 	}
 	start := time.Now()
-	answer := h.runMethod(cp.ctx, msg, callb, args)
+	answerCh := make(chan *jsonrpcMessage)


This channel needs to be buffered, otherwise the handler goroutine will leak when there is a timeout.

holiman · 2022-08-31T18:25:23Z

What I don't like about this PR, is that the RPC handler will keep running in case of timeout, because it executes the RPC call on a background goroutine now.

I agree about this. If you have a DoS-RPC request that you use to crash some remote node (or infura), then this feature right here would make it even easier to cause DoS. You could just sequentially fire and forget, and get a multihreaded impact.

holiman · 2022-11-05T14:47:45Z

TODO Triage discussion: worth doing this or is it dead in the water, due to the inherent limitations of this apprach?

fjl · 2022-11-15T13:40:29Z

Alternative approach:

When handling RPC call, launch a timer using time.AfterFunc(timeout, callback) where the callback will deliver the error response if the response was not sent yet. This can be done using a 1-buffered channel for example.

responseSlot := make(chan struct{}, 1)
responseSlot <- struct{}{}

responded := make(struct{})

timeoutResponseTimer := time.AfterFunc(timeout, func() {
    select {
    case <-responseSlot:
         // The timeout occurred and the method has not responded yet.
         // send the timeout error response
         close(responded)
    case <-responded:
         // The method responded.
    }
})

response := runMethod()
timeoutResponseTimer.Stop()
select {
case <-responseSlot:
     // send the response
      close(responded)
case <-responded:
     // timeout error response was already sent
}

This reverts commit 2970cb0c957fa31c6bb3687663652b477495c7bf.

Co-authored-by: Martin Holst Swende <[email protected]>

fjl · 2022-11-15T19:29:29Z

Actually, it can also be done with less channels using sync.Once

var respondOnce sync.Once
timeoutResponseTimer := time.AfterFunc(timeout, func() {
    respondOnce.Do(func () {
        // send timeout error
    })
})

response := runMethod()
timeoutResponseTimer.Stop()
respondOnce.Do(func () {
    // send response
})

This reverts commit 3feba50.

fjl · 2022-12-06T20:13:11Z

I have decided to remove changes in eth/filters and eth/tracers from the PR. We can submit them in a separate change.

fjl · 2022-12-06T20:23:04Z

RPC method handler changes resubmitted in #26320

Here we add special handling for sending an error response when the write timeout of the HTTP server is just about to expire. This is surprisingly difficult to get right, since is must be ensured that all output is fully flushed in time, which needs support from multiple levels of the RPC handler stack: The timeout response can't use chunked transfer-encoding because there is no way to write the final terminating chunk. net/http writes it when the topmost handler returns, but the timeout will already be over by the time that happens. We decided to disable chunked encoding by setting content-length explicitly. Gzip compression must also be disabled for timeout responses because we don't know the true content-length before compressing all output, i.e. compression would reintroduce chunked transfer-encoding.

s1na requested a review from fjl as a code owner August 1, 2022 12:50

s1na requested a review from gballet as a code owner August 1, 2022 15:18

s1na requested a review from holiman as a code owner August 8, 2022 17:15

s1na requested review from karalabe and rjl493456442 as code owners August 9, 2022 15:15

s1na commented Aug 9, 2022

View reviewed changes

s1na added the status:triage label Aug 11, 2022

fjl removed the status:triage label Aug 11, 2022

holiman reviewed Aug 30, 2022

View reviewed changes

rpc/handler.go Outdated Show resolved Hide resolved

holiman approved these changes Aug 30, 2022

View reviewed changes

fjl requested changes Aug 31, 2022

View reviewed changes

holiman added the status:triage label Nov 5, 2022

s1na mentioned this pull request Nov 8, 2022

graphql: add query timeout to prevent dos attack #26116

Merged

s1na and others added 9 commits November 15, 2022 16:47

node: fix rpc write timeout

8ea20d7

add timeout to graphql tests

8dcbf87

Revert "node: fix rpc write timeout"

97f9d97

This reverts commit 2970cb0c957fa31c6bb3687663652b477495c7bf.

handle timeout through context directly in handler

d970cf7

eth/filters: respect ctx timeout in unindexedLogs

3f0f2bb

eth/tracers: respect ctx timeout in IntermediateRoots

245aff9

use writeTimeout, not read

07c763f

explicitly add timeout to tests

477d724

improve

0d0da18

Co-authored-by: Martin Holst Swende <[email protected]>

s1na force-pushed the rpc/timeout branch from ab8b5e1 to 0d0da18 Compare November 15, 2022 15:52

alternative approach

1a3ae0c

s1na added 2 commits December 4, 2022 15:38

return graphql error

6ed9113

remove length tracking from gzipWriter

3feba50

fjl removed the status:triage label Dec 6, 2022

fjl added 7 commits December 6, 2022 19:42

Revert "remove length tracking from gzipWriter"

2680314

This reverts commit 3feba50.

node: fix some issues in gzip handler and add test

177927a

node: update comment

be88406

rpc: rename to ContextRequestTimeout

9bfe5af

graphql: add comments

f834267

node: unset gz when rw closed

e30a7cb

eth/filters, eth/tracers: remove cancel checks

507ba3d

fjl changed the title ~~node: rpc write timeout work-around~~ graphql, node, rpc: HTTP write timeout handling Dec 6, 2022

fjl changed the title ~~graphql, node, rpc: HTTP write timeout handling~~ graphql, node, rpc: improve HTTP write timeout handling Dec 6, 2022

fjl added this to the 1.11.0 milestone Dec 6, 2022

fjl added 5 commits December 6, 2022 21:26

graphql: cancel the request context on timeout

6d0746c

node: fix typo

272bc6e

node: reword comment

3dd9134

node: less space

0123227

node: even more comment updates

41befc6

fjl merged commit f20eba4 into ethereum:master Dec 7, 2022

Tristan-Wilson mentioned this pull request Feb 28, 2023

Merge v1.11.2 OffchainLabs/go-ethereum#205

Merged

13 tasks

0xcb9ff9 mentioned this pull request Sep 9, 2023

common/pool: fix pool abuse dogechain-lab/dbsc#27

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graphql, node, rpc: improve HTTP write timeout handling #25457

graphql, node, rpc: improve HTTP write timeout handling #25457

s1na commented Aug 1, 2022 •

edited

Loading

s1na commented Aug 9, 2022

s1na Aug 9, 2022 •

edited

Loading

holiman commented Aug 30, 2022

fjl left a comment

fjl Aug 31, 2022

holiman commented Aug 31, 2022

holiman commented Nov 5, 2022 •

edited

Loading

fjl commented Nov 15, 2022 •

edited

Loading

fjl commented Nov 15, 2022

fjl commented Dec 6, 2022

fjl commented Dec 6, 2022

graphql, node, rpc: improve HTTP write timeout handling #25457

graphql, node, rpc: improve HTTP write timeout handling #25457

Conversation

s1na commented Aug 1, 2022 • edited Loading

s1na commented Aug 9, 2022

s1na Aug 9, 2022 • edited Loading

Choose a reason for hiding this comment

holiman commented Aug 30, 2022

fjl left a comment

Choose a reason for hiding this comment

fjl Aug 31, 2022

Choose a reason for hiding this comment

holiman commented Aug 31, 2022

holiman commented Nov 5, 2022 • edited Loading

fjl commented Nov 15, 2022 • edited Loading

fjl commented Nov 15, 2022

fjl commented Dec 6, 2022

fjl commented Dec 6, 2022

s1na commented Aug 1, 2022 •

edited

Loading

s1na Aug 9, 2022 •

edited

Loading

holiman commented Nov 5, 2022 •

edited

Loading

fjl commented Nov 15, 2022 •

edited

Loading