Record more detailed HTTP stats #99852

ywangd · 2023-09-25T07:24:13Z

This PR adds more details HTTP stats breaking down by HTTP routes.

Resolves: #95739

elasticsearchmachine · 2023-09-25T07:24:37Z

Hi @ywangd, I've created a changelog YAML for you.

ywangd · 2023-09-25T07:29:31Z

@DaveCTurner This PR is marked as draft because I am seeking high level feedback on the approach. The issue (#95739) says

add to the node/cluster stats some histograms of response times and request/response sizes broken down by the HTTP endpoint

This PR currently has support for the request and response sizes. I'd appreciate to verify whether the proposed changes make sense. I'd also like to verify whether we are only interested in "response" times, i.e. how long it takes for the response to be ready after the HTTP request is dispatched? Thanks!

idegtiarenko · 2023-09-25T08:06:05Z

server/src/main/java/org/elasticsearch/http/HttpRouteStats.java

+    long[] requestSizeHistogram,
+    long responseCount,
+    long totalResponseSize,
+    long[] responseSizeHistogram


I do not think this is mentioned in the original issue, but is it possible or beneficial to track response statuses counts as well?

It is relatively easy to track the number of different response status. But we are probably more interested in their recent trends rather than the overall stats from last restart because we want to know whether the node is "currently" experiencing problem. This means we need to compute some moving averages instead of the overall average which is what we are doing here for request/response sizes. I have not yet found an existing example of computing moving averages for stats collection. It's definitely doable. But I also wonder whether it starts getting into the territory of APM and should be handled externally. This might be why we haven't done it? I'll dig a bit more. For the purpose of this PR, I think it's better to keep them separate.

idegtiarenko · 2023-09-25T08:10:43Z

server/src/test/java/org/elasticsearch/rest/action/info/RestClusterInfoActionTests.java

@@ -89,7 +89,8 @@ public void testHttpResponseMapper() {
                    .map(HttpStats::clientStats)
                    .map(Collection::stream)
                    .reduce(Stream.of(), Stream::concat)
-                    .toList()
+                    .toList(),
+                Map.of()


Not related to the change itself, just surprised to see .map(Collection::stream).reduce(Stream.of(), Stream::concat). Is there any benefit over flatMap(Collection::stream)?

I don't think so. Sometimes a test can be deliberately written to avoid using the same pattern as the production code. But this does not seem to be the case either.

server/src/main/java/org/elasticsearch/rest/ChunkedRestResponseBody.java

ywangd · 2023-09-26T07:05:11Z

This PR is mostly ready. I'd like to get the approval for the overall approach before proceed with a few further polishments including (1) some more tests and (2) potentially merge the two classes of HttpRouteStats and TransportActionStats. Thanks!

DaveCTurner

I like it. Left a few comments & suggestions.

server/src/main/java/org/elasticsearch/http/HttpRouteStats.java

DaveCTurner · 2023-09-26T07:33:20Z

server/src/main/java/org/elasticsearch/rest/ChunkedRestResponseBody.java

+                var result = chunkIterator.hasNext() == false;
+                if (result && sizeConsumer != null) {
+                    sizeConsumer.accept(size);
+                    sizeConsumer = null;
+                }


I'd rather we added some logic to DefaultRestChannel to grab the size when the response is closed - for instance we may not get all the way to isDone() before the client closes the channel.

Thanks for the prompt. I have now moved these logic entirely into RestController and I think it looks better than before. I considered DefaultRestChannel as well, but I think having it done within RestController is better due to easier access to different pieces of information. Plese refer to 5c1fdb7

DaveCTurner · 2023-09-26T07:34:47Z

server/src/main/java/org/elasticsearch/rest/RestController.java

            this.delegate = delegate;
            this.circuitBreakerService = circuitBreakerService;
            this.contentLength = contentLength;
+            this.methodHandlers = methodHandlers;
+            this.startTime = System.currentTimeMillis();


This clock is not monotonic, we should be using System::nanoTime (or, better, ThreadPool#rawRelativeTimeInMillis)

ThreadPool is not available in RestController. So I chose to use System::nanoTime. In fact, I copied what ThreadPool#rawRelativeTimeInMillis does. This method can potentially be static and reused here. But it seems some tests rely on it being an instance method so that it can be overriden. So I decided to just copy it for now since the duplication is minimal.
Please refer to 4a4cd4a

server/src/test/java/org/elasticsearch/http/HttpStatsTests.java

elasticsearchmachine · 2023-09-27T04:06:38Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner

Looks good, I left a few nits, the only real problem remaining is that we don't handle responses larger than Integer.MAX_VALUE properly yet.

DaveCTurner · 2023-09-27T09:42:55Z

server/src/main/java/org/elasticsearch/http/HttpRouteStatsTracker.java

+public class HttpRouteStatsTracker {
+
+    /*
+     * default http.max_content_length is 100 MB so that the last histogram bucket is > 64MB (2^26)


That's the maximum request size but for responses we can return much more (maybe even GiBs) - suggest adding another 4 or 5 buckets at least.

Good call. I added 4 more buckets so that the last bucket is for > 1.0GB.

DaveCTurner · 2023-09-27T09:45:31Z

server/src/main/java/org/elasticsearch/rest/RestController.java

+        return StreamSupport.stream(Spliterators.spliteratorUnknownSize(handlers.allNodeValues(), Spliterator.ORDERED), false)
+            .filter(mh -> mh.getStats().requestCount() > 0 || mh.getStats().responseCount() > 0)
+            .collect(Maps.toUnmodifiableSortedMap(MethodHandlers::getPath, MethodHandlers::getStats));


Doing this with streams seems fairly awkward, I feel a regular loop to build a TreeMap would be simpler.

I changed it to use a while loop with the iterator to build a TreeMap.

DaveCTurner · 2023-09-27T09:52:23Z

server/src/test/java/org/elasticsearch/rest/action/info/RestClusterInfoActionTests.java

@@ -89,7 +91,12 @@ public void testHttpResponseMapper() {
                    .map(HttpStats::clientStats)
                    .map(Collection::stream)
                    .reduce(Stream.of(), Stream::concat)
-                    .toList()
+                    .toList(),
+                nodeStats.stream().map(NodeStats::getHttp).map(HttpStats::httpRouteStats).reduce(Map.of(), (l, r) -> {


I know this is just test code, but still I think we can simplify the two-maps-and-a-reduce by moving the mapped functions inside the lambda called within the reduce. Also this looks to do an awful lot of copying of maps to convert them between mutable and immutable versions, do we really need that?

I think moving the two maps inside reduce is more clunky because the reduce needs to change type and Java asks you to supply both accumulator and combinator which do pretty much the same thing for this use case. I ended up moving it outside and build it separately with a for loop (see here).

DaveCTurner · 2023-09-27T09:58:46Z

server/src/main/java/org/elasticsearch/rest/RestController.java

+
+        private final ChunkedRestResponseBody delegate;
+        private final Consumer<Integer> encodedLengthConsumer;
+        private int encodedLength = 0;


This could end up exceeding Integer.MAX_VALUE

This is a good point. It is now handled with Math.addExact. The value is also capped to Integer.MAX_VALUE in case of overflow. I think it is OK to not track the exact number since (I think?) it is rare and we don't really need the exact number for histogram. The total response size will be off. But I feel that's OK.
Please see code changes here)

Hm I think that's going to be trappy, we might well want to add another bucket to the end of the histogram later. Why not just use long here?

I didn't think it is a case that requires accurate handling. But upgrading to long is also fine. I updated it in 49b95f8

server/src/main/java/org/elasticsearch/rest/RestController.java

ywangd · 2023-10-03T07:02:17Z

@elasticmachine update branch

ywangd · 2023-10-03T07:48:18Z

Failure is unrelated and tracked at #96805

ywangd · 2023-10-04T23:14:39Z

Bump for reviews. Thanks!

Btw, 8.11 branch just got created. I don't think we need this for 8.11 anyway. So 8.12 is just fine.

ywangd · 2023-10-11T11:05:22Z

@DaveCTurner Merge conflict for TransportVersions is now fixed.

DaveCTurner

LGTM

qa/smoke-test-http/src/javaRestTest/java/org/elasticsearch/http/HttpSmokeTestCase.java

…p/HttpSmokeTestCase.java Co-authored-by: David Turner <[email protected]>

idegtiarenko · 2023-10-12T11:26:36Z

server/src/main/java/org/elasticsearch/http/HttpRouteStats.java

+    @Override
+    public boolean equals(Object o) {
+        if (this == o) return true;
+        if (o == null || getClass() != o.getClass()) return false;
+        HttpRouteStats that = (HttpRouteStats) o;
+        return requestCount == that.requestCount
+            && totalRequestSize == that.totalRequestSize
+            && responseCount == that.responseCount
+            && totalResponseSize == that.totalResponseSize
+            && Arrays.equals(requestSizeHistogram, that.requestSizeHistogram)
+            && Arrays.equals(responseSizeHistogram, that.responseSizeHistogram)
+            && Arrays.equals(responseTimeHistogram, that.responseTimeHistogram);
+    }
+
+    @Override
+    public int hashCode() {
+        int result = Objects.hash(requestCount, totalRequestSize, responseCount, totalResponseSize);
+        result = 31 * result + Arrays.hashCode(requestSizeHistogram);
+        result = 31 * result + Arrays.hashCode(responseSizeHistogram);
+        result = 31 * result + Arrays.hashCode(responseTimeHistogram);
+        return result;
+    }


This is a record, do we need override equals and hashCode? Or they do not handle arrays?
In such case would it be beneficial to wrap array into Histogram object?
This could also simplify merging them and serializing toXContent

Yeah it's because the default does not handle arrays. I can look into wrapping it as a separete Histogram in a follow-up.

idegtiarenko · 2023-10-12T11:29:49Z

server/src/main/java/org/elasticsearch/http/HttpServerTransport.java

@@ -52,5 +54,8 @@ interface Dispatcher {
         */
        void dispatchBadRequest(RestChannel channel, ThreadContext threadContext, Throwable cause);

+        default Map<String, HttpRouteStats> getStats() {


NIT: I think it is worth renaming to routeStats or perRouteStats to be consistent with stats method. That one could also have some reasonable default now.

idegtiarenko

Left couple of suggestions around the time the pr was automerged.
They could be applied as a followup if considered worthy.

ywangd · 2023-10-12T12:21:10Z

@idegtiarenko Sorry this got merged before your comments. I will address your points in a follow-up. Thanks!

In elastic#99852 we introduced a layer of wrapping around the `RestResponse` to capture the length of a chunked-encoded response, but this wrapping does not preserve the headers of the original response. This commit fixes the bug.

In #99852 we introduced a layer of wrapping around the `RestResponse` to capture the length of a chunked-encoded response, but this wrapping does not preserve the headers of the original response. This commit fixes the bug.

In elastic#99852 we introduced a layer of wrapping around the `RestResponse` to capture the length of a chunked-encoded response, but this wrapping does not preserve the headers of the original response. This commit fixes the bug. Backport of elastic#104808 to 8.12

* Fix lost headers with chunked responses In #99852 we introduced a layer of wrapping around the `RestResponse` to capture the length of a chunked-encoded response, but this wrapping does not preserve the headers of the original response. This commit fixes the bug. Backport of #104808 to 8.12 * Fix compile

WIP

c22fe4a

ywangd added >enhancement :Distributed Coordination/Network Http and internode communication implementations v8.11.0 labels Sep 25, 2023

ywangd requested review from idegtiarenko and DaveCTurner September 25, 2023 07:24

Update docs/changelog/99852.yaml

56d857a

idegtiarenko reviewed Sep 25, 2023

View reviewed changes

ywangd commented Sep 25, 2023

View reviewed changes

server/src/main/java/org/elasticsearch/rest/ChunkedRestResponseBody.java Outdated Show resolved Hide resolved

ywangd added 5 commits September 26, 2023 10:56

Merge remote-tracking branch 'origin/main' into es-95739

2a4f24a

fix test

32a9e3a

add response time

e669041

add merge and info stats support

2873701

tests tweak

4504e52

DaveCTurner reviewed Sep 26, 2023

View reviewed changes

ywangd added 5 commits September 27, 2023 12:50

wrap chunked body and report size on close

5c1fdb7

use nanoTime

4a4cd4a

add tests for httpRouteStats

0d207b0

add tests for httpRouteStats

860341e

Merge remote-tracking branch 'origin/main' into es-95739

971f047

ywangd marked this pull request as ready for review September 27, 2023 04:06

ywangd requested a review from DaveCTurner September 27, 2023 04:06

elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Sep 27, 2023

ywangd requested a review from idegtiarenko September 27, 2023 04:06

DaveCTurner reviewed Sep 27, 2023

View reviewed changes

Merge remote-tracking branch 'origin/main' into es-95739

159ff77

ywangd mentioned this pull request Oct 3, 2023

[CI] LuceneCountOperatorTests testSimple failing #100175

Closed

Merge branch 'main' into es-95739

f04e5af

Merge remote-tracking branch 'origin/main' into es-95739

3a62b0d

mattc58 added v8.12.0 and removed v8.11.0 labels Oct 4, 2023

Merge remote-tracking branch 'origin/main' into es-95739

55fc8d3

DaveCTurner approved these changes Oct 12, 2023

View reviewed changes

qa/smoke-test-http/src/javaRestTest/java/org/elasticsearch/http/HttpSmokeTestCase.java Outdated Show resolved Hide resolved

ywangd and others added 3 commits October 12, 2023 21:18

Update qa/smoke-test-http/src/javaRestTest/java/org/elasticsearch/htt…

345623a

…p/HttpSmokeTestCase.java Co-authored-by: David Turner <[email protected]>

fix import

a4355c4

Merge remote-tracking branch 'origin/main' into es-95739

bfb00c8

ywangd added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Oct 12, 2023

tweak

bd4ef2c

idegtiarenko reviewed Oct 12, 2023

View reviewed changes

elasticsearchmachine merged commit 6ed4ad5 into elastic:main Oct 12, 2023

ywangd deleted the es-95739 branch October 12, 2023 11:27

idegtiarenko reviewed Oct 12, 2023

View reviewed changes

DaveCTurner mentioned this pull request Jan 26, 2024

Fix lost headers with chunked responses #104808

Merged

DaveCTurner mentioned this pull request Jan 29, 2024

Fix lost headers with chunked responses #104845

Merged

pquentin mentioned this pull request Sep 12, 2024

Add HTTP routes to cluster/nodes stats elastic/elasticsearch-specification#2886

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record more detailed HTTP stats #99852

Record more detailed HTTP stats #99852

ywangd commented Sep 25, 2023

elasticsearchmachine commented Sep 25, 2023

ywangd commented Sep 25, 2023

idegtiarenko Sep 25, 2023

ywangd Sep 25, 2023

idegtiarenko Sep 25, 2023

ywangd Sep 26, 2023

ywangd commented Sep 26, 2023

DaveCTurner left a comment

DaveCTurner Sep 26, 2023

ywangd Sep 27, 2023 •

edited

Loading

DaveCTurner Sep 26, 2023

ywangd Sep 27, 2023

elasticsearchmachine commented Sep 27, 2023

DaveCTurner left a comment

DaveCTurner Sep 27, 2023

ywangd Sep 27, 2023

DaveCTurner Sep 27, 2023

ywangd Sep 27, 2023

DaveCTurner Sep 27, 2023

ywangd Sep 27, 2023

DaveCTurner Sep 27, 2023

ywangd Sep 27, 2023

DaveCTurner Sep 27, 2023

ywangd Sep 27, 2023

ywangd commented Oct 3, 2023

ywangd commented Oct 3, 2023

ywangd commented Oct 4, 2023

ywangd commented Oct 11, 2023

DaveCTurner left a comment

idegtiarenko Oct 12, 2023

ywangd Oct 12, 2023

idegtiarenko Oct 12, 2023

idegtiarenko left a comment

ywangd commented Oct 12, 2023

Record more detailed HTTP stats #99852

Record more detailed HTTP stats #99852

Conversation

ywangd commented Sep 25, 2023

elasticsearchmachine commented Sep 25, 2023

ywangd commented Sep 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywangd commented Sep 26, 2023

DaveCTurner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywangd Sep 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Sep 27, 2023

DaveCTurner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywangd commented Oct 3, 2023

ywangd commented Oct 3, 2023

ywangd commented Oct 4, 2023

ywangd commented Oct 11, 2023

DaveCTurner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

idegtiarenko left a comment

Choose a reason for hiding this comment

ywangd commented Oct 12, 2023

ywangd Sep 27, 2023 •

edited

Loading