-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Research] [BUG] Extension requests show "false" timeout for first few requests to extension due to unknown bug #8652
Comments
Personally I've been seeing a lot of (apparently random) timeout issues like this that I believe are associated with Netty Channels and mixing between REST streams and Transport streams. Digging into the logs, the commonality seems to be that the Transport listener doesn't process its |
I have encountered this issue too |
Update: Sought inputs from reta to understand why CompletableFuture is causing the issue. @reta Was unable to re-produce this issue and he had the fresh setup with security plugin installed and Update: I was able to re-produce it with a fresh setup on Windows machine. (without security plugin installed) |
Here are the logs for timeouts:False timeouts3.0: (OpenSearch
|
What we figured out with @DarshitChanpura :
The conclusion seems to be that |
@dbwiddis @saratvemulapalli ^ You may be interested in @reta's findings here. Thank you @reta! |
From our research, we discovered that join() blocks a Netty As a follow-up item from this research:
|
Here is debugger flow chart for happy path and unhappy path: Happy path: flowchart TB
subgraph Origin
User
end
subgraph RestSendToExtensionAction
User --> |1| B[transportService.sendRequest]
B --> C[inProgressFuture.join]
end
subgraph SDK
C -->|handleRestExecuteOnExtensionRequest| D[Request Handled]
D -->|Response Sent| E[Response Sent to OS Cluster]
end
subgraph RestSendToExtensionAction
E --> F[restExecuteOnExtensionResponseHandler.handleResponse]
F --> G[inProgressFuture.complete]
end
G --> User
Unhappy path: flowchart TB
subgraph Origin
User
end
subgraph RestSendToExtensionAction
User --> |1| B[transportService.sendRequest]
B --> C[inProgressFuture.join]
end
subgraph SDK
B -->|handleRestExecuteOnExtensionRequest| D[Request Handled]
D -->|Response Sent| E[Response Sent to OS Cluster]
end
subgraph RestSendToExtensionAction
C --> H[TimeoutException after 10 sec]
E --> F[restExecuteOnExtensionResponseHandler.handleResponse]
F --> G[inProgressFuture.complete]
G --> X[response dies somewhere]
end
H --> |No Response received from Extension| User
We can see different behaviors of inProgressFuture in both cases which is likely the root cause of the issue. I was able to verify that inProgressFuture is handled on same
|
Closing this to move all further discussions about the approach here: #9435 |
Describe the bug
Latest changes introduced via #7957 causing request to false "timeout" even though the request was sent to and processed by extension.Update: These changes didn't cause the issue. It seems to have existed before this PR was merged.
To Reproduce
Steps to reproduce the behavior:
main
and starthelloWorld
extension frommain
on SDKcurl -XGET http://localhost:9200/_extensions/_hello-world/hello
should produceNo response from extension to request.%
Default timeout is 10 seconds for Extension requests.
Expected behavior
Request should not timeout.
The text was updated successfully, but these errors were encountered: