SOLR-17582 Stream CLUSTERSTATUS API response #2916

mlbiscoc · 2024-12-18T16:35:25Z

https://issues.apache.org/jira/browse/SOLR-17582

Description

CLUSTERSTATUS would aggregate and build the whole response including all collection properties into an NamedList before finally passing it to the response writers. With many or thousands of collections, this can use up memory.

Solution

Instead of returning a NamedList, return a MapWriter. The Mapwriter will pull an Iterator from the documentStream and create SolrCollectionProperiesIterator implementing iterator and override next() to create and write the response in time, instead of writing the whole thing in memory.

Tests

Fix appropriate tests that were using NamedList to Map because of the MapWriter.

Checklist

Please review the following and check all that apply:

I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
I have created a Jira issue and added the issue ID to my pull request title.
I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
I have developed this patch against the main branch.
I have run ./gradlew check.
I have added tests for my changes.
I have added documentation for the Reference Guide

dsmiley

Thanks for contributing!

dsmiley · 2024-12-18T18:56:14Z

solr/core/src/java/org/apache/solr/handler/admin/ClusterStatus.java

+      if (shard != null) {
+        String[] paramShards = shard.split(",");
+        requestedShards.addAll(Arrays.asList(paramShards));
+      }


ideally this is done up front; not per iteration

Moved it out so its not per iteration.

dsmiley · 2024-12-18T18:56:52Z

solr/core/src/java/org/apache/solr/handler/admin/ClusterStatus.java

+      byte[] bytes = Utils.toJSON(clusterStateCollection);
+      @SuppressWarnings("unchecked")
+      Map<String, Object> docCollection = (Map<String, Object>) Utils.fromJSON(bytes);


why round-trip this?

So looked at this and I understand now why it was done this way. It wants to just write this out to the TextWriter as a normal POJO but that doesn't seem to be what the DocCollection class is. So instead what was done was to just write it to JSON byte[] then back to a Map which was the "easiest" way.

Was looking to avoid this back and forth, but there were a few options. I tried using an ObjectMapper to Map.class but there is an error for unable to cast Instant due to a missing dependency for Jackson we need to introduce.
Java 8 date/time type java.time.Instant not supported by default Issue

Other way is to introduce a some kind of toMap method so that the TextWriter can write this as a generic Map.

Another option which actually looks like the way we should go is I found that the DocCollection class extends ZkNodeProps which implements MapWriter. DocCollection already overrides writeMap so we could just return this to the TextWriter! Unfortunately the ClusterStatus class does a bunch of JSON post processing such as Health added to the Map that the output is missing some things because of this postProcessCollectionJSON() method.

I am thinking we should refactor DocCollection to so we can just return this but the changes were much more drastic but may be worth it. Maybe in a different JIRA? This scope continues to creep with me adding improvement to NamedList. Would be happy to refactor this and pick it up if you agree. Would clean up much more code and avoid this JSON processing for every collection iteration.

dsmiley · 2024-12-18T19:00:36Z

solr/core/src/java/org/apache/solr/handler/admin/ClusterStatus.java

@@ -368,4 +341,77 @@ public static Map<String, Object> postProcessCollectionJSON(Map<String, Object>
    collection.put("health", Health.combine(healthStates).toString());
    return collection;
  }
+
+  private class SolrCollectionProperiesIterator implements Iterator<NamedList<Object>> {


I was hoping we wouldn't need a custom Iterator. We do need a method that takes in DocCollection (and some other context that is the same across collections) and returns a NamedList. With such a method, we can call it via streamOfDocCollection.map(collState -> theMethod(collState, routeKey, liveNodes, etc.).iterator()

Ah yeah don't know why I didn't do this first. Changed it appropriately.

solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java

mlbiscoc · 2024-12-19T19:56:03Z

solr/core/src/java/org/apache/solr/handler/admin/ClusterStatus.java

+          Iterator<Map<String, Object>> it =
+              collectionStream
+                  .map(
+                      (collectionState) ->
+                          collectionPropsResponse(
+                              collectionState,
+                              collectionVsAliases,
+                              routeKey,
+                              liveNodes,
+                              requestedShards))
+                  .iterator();
+          while (it.hasNext()) {
+            Map<String, Object> props = it.next();
+            props.forEach(
+                (key, value) -> {
+                  try {
+                    ew.put(key, value);
+                  } catch (IOException e) {
+                    throw new RuntimeException(e);
+                  }
+                });


I am thinking, if we can change up DocCollection in another PR, we can delete all those post processing functions and condense the response to this.

MapWriter collectionPropsWriter = ew -> { Iterator<DocCollection> it = collectionStream.iterator(); while (it.hasNext()) { DocCollection collState = it.next(); ew.put(collState.getName(), collState); } };

I should mention Solr has two similar APIs, ClusterStatus & ColStatus. See org.apache.solr.handler.admin.ColStatus#getColStatus. It'd be awesome if there was a single method that takes a DocCollection (and some other params as needed) to produce a NamedList. At least ColStatus's code doesn't have the sad JSON round-trip. Maybe it's own PR or not as you wish.

Gotcha. I'll take a look at this in a separate PR/Jira

dsmiley

Leave the round-trip serialization -- I didn't realize it was that way before.
Leave findRecursive be... not as easy as you thought and probably deserves its own issue.

solr/core/src/java/org/apache/solr/handler/admin/ClusterStatus.java

solr/solrj/src/java/org/apache/solr/common/util/NamedList.java

solr/core/src/java/org/apache/solr/handler/admin/ClusterStatus.java

dsmiley · 2024-12-19T20:09:53Z

solr/core/src/java/org/apache/solr/handler/admin/ClusterStatus.java

+          Iterator<Map<String, Object>> it =
+              collectionStream
+                  .map(
+                      (collectionState) ->
+                          collectionPropsResponse(
+                              collectionState,
+                              collectionVsAliases,
+                              routeKey,
+                              liveNodes,
+                              requestedShards))
+                  .iterator();
+          while (it.hasNext()) {
+            Map<String, Object> props = it.next();
+            props.forEach(
+                (key, value) -> {
+                  try {
+                    ew.put(key, value);
+                  } catch (IOException e) {
+                    throw new RuntimeException(e);
+                  }
+                });


I should mention Solr has two similar APIs, ClusterStatus & ColStatus. See org.apache.solr.handler.admin.ColStatus#getColStatus. It'd be awesome if there was a single method that takes a DocCollection (and some other params as needed) to produce a NamedList. At least ColStatus's code doesn't have the sad JSON round-trip. Maybe it's own PR or not as you wish.

solr/core/src/java/org/apache/solr/handler/admin/ClusterStatus.java

dsmiley · 2024-12-20T19:44:21Z

solr/core/src/java/org/apache/solr/handler/admin/ClusterStatus.java

+    Set<String> shards = new HashSet<>(requestedShards);
    String name = clusterStateCollection.getName();

    if (routeKey != null) {
      DocRouter router = clusterStateCollection.getRouter();
      Collection<Slice> slices = router.getSearchSlices(routeKey, null, clusterStateCollection);
      for (Slice slice : slices) {
-        requestedShards.add(slice.getName());
+        shards.add(slice.getName());
      }


something seems wrong to me here. If requestedShards is specified, we should use that. If routeKey is specified, we should use that (to compute the shards). Or neither (99% of users won't do either). But we shouldn't do both if for no other reason that it's confusing.

We could throw a SolrException(BAD_REQUEST) higher up the stack if passed both shard and _route_. I can update the ref-docs to say you can only do one or the other. But this could break people who are doing both.

I also changed that logic into a single line and forEach

I recommend that but understand if you leave out of scope.
(I really don't think this is going to break anyone's current usage, BTW)

I'll add it in when I come back for the refactor.

dsmiley · 2024-12-20T19:48:27Z

solr/core/src/java/org/apache/solr/handler/admin/ClusterStatus.java

+    Map<String, Object> shards = (Map<String, Object>) collectionProps.get("shards");
+    for (Object nextShard : shards.values()) {
+      Map<String, Object> shardMap = (Map<String, Object>) nextShard;
+      Map<String, Object> replicas = (Map<String, Object>) shardMap.get("replicas");
+      for (Object nextReplica : replicas.values()) {
+        Map<String, Object> replicaMap = (Map<String, Object>) nextReplica;


I almost want to cry just glancing at this.
This is the poster-child for why Java introduced "var". And there may be other approaches to improve it but that's the simplest.
(Yeah you didn't write it; I know)

I'll change those Maps to var but its another reason I think I'll come back to this. I could try to improve on this and I think its possible to remove a bunch of code here like buildResponseForCollection and postProcessCollectionJSON

totally understood. In some old Solr code like this, there's always "and one more thing" we could/should do but ultimately snowballs the scope out of control. I leave it to you to do as you wish. Thank you for your contribution here; I didn't mean to get more out of you than you bargained for :-)

dsmiley · 2024-12-20T19:51:41Z

solr/core/src/java/org/apache/solr/handler/admin/ClusterStatus.java

+                          liveNodes,
+                          requestedShards));
+                } catch (IOException e) {
+                  throw new RuntimeException(e);


Why does buildResponseForCollection throw IOException? That's suspicious. If you must, catch in there and throw a suitable exception like SolrException (which extends RuntimeException and is generally preferred within Solr over RE).

buildResponseForCollection doesn't throw the IOException, the EntryWriter does but looks like there is actually a putNoEx method that catches for us and that throws a generic RuntimeException. I could throw a SolrException there with better logging if you think its better.

putNoEx then

dsmiley · 2024-12-20T19:52:08Z

solr/core/src/java/org/apache/solr/handler/admin/ClusterStatus.java

+          collectionStream.forEach(
+              (collectionState) -> {


friggin beautiful now

putNoEx removes the try/catch we can get it cleaner!

dsmiley

Uh oh; a back-compat problem is dawning on me. Any SolrJ consumer of this API (especially using the default "javabin" format) is going to break here. The change to BaseHttpClusterStateProvider should have made this glaringly obvious but I overlooked its implication earlier (face-palm).

I wouldn't mind if we do this only for V2 (which doesn't yet have back-compatibility concerns) but V2 only has one collection status list returning API (/cluster) and I've debated against the very existence of /cluster for V2. There's not yet a proposal for an alternative. Furthermore V2 raises the bigger question of coalescing on a single DocCollection serialization method.

If we forge ahead anyway (not waiting for V2), we'll have to work around this compatibility. I had a couple ideas but whittled down to one:

detect the response format in this handler. If it's Javabin, we write to a NamedList (or better, SimpleOrderedMap), defeating the point of this PR for that format. If JSON/XML is used, this PR will accomplish its goal. Need to write a little test for JSON that at least partially sanity checks the result. Such a test should pass without this PR.

mlbiscoc added 4 commits December 17, 2024 09:53

Move collections properties into an iterable to stream

f64e3d3

Change name to collectionPropsIt

bbbc62d

Use MapWriter

6ed4a95

Fix tests where it used namedList

3627bda

github-actions bot added client:solrj tests cat:cloud cat:cli labels Dec 18, 2024

Merge branch 'main' into SOLR-17582-clusterstatus-stream

c5a485f

dsmiley reviewed Dec 18, 2024

View reviewed changes

Address PR comments

8b8d37d

mlbiscoc commented Dec 19, 2024

View reviewed changes

dsmiley reviewed Dec 19, 2024

View reviewed changes

Next round of PR review

457da45

dsmiley reviewed Dec 20, 2024

View reviewed changes

Cleanup some code in ClusterStatus

f6e77cd

dsmiley reviewed Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SOLR-17582 Stream CLUSTERSTATUS API response #2916

SOLR-17582 Stream CLUSTERSTATUS API response #2916

mlbiscoc commented Dec 18, 2024

dsmiley left a comment

dsmiley Dec 18, 2024

mlbiscoc Dec 19, 2024

dsmiley Dec 18, 2024

mlbiscoc Dec 19, 2024 •

edited

Loading

dsmiley Dec 18, 2024

mlbiscoc Dec 19, 2024

mlbiscoc Dec 19, 2024

dsmiley Dec 19, 2024

mlbiscoc Dec 20, 2024

dsmiley left a comment

dsmiley Dec 19, 2024

dsmiley Dec 20, 2024

mlbiscoc Dec 20, 2024

dsmiley Dec 20, 2024

mlbiscoc Dec 20, 2024

dsmiley Dec 20, 2024

mlbiscoc Dec 20, 2024

dsmiley Dec 20, 2024

dsmiley Dec 20, 2024 •

edited

Loading

mlbiscoc Dec 20, 2024

dsmiley Dec 20, 2024

dsmiley Dec 20, 2024

mlbiscoc Dec 20, 2024

dsmiley left a comment

SOLR-17582 Stream CLUSTERSTATUS API response #2916

Are you sure you want to change the base?

SOLR-17582 Stream CLUSTERSTATUS API response #2916

Conversation

mlbiscoc commented Dec 18, 2024

Description

Solution

Tests

Checklist

dsmiley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlbiscoc Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsmiley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsmiley Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsmiley left a comment

Choose a reason for hiding this comment

mlbiscoc Dec 19, 2024 •

edited

Loading

dsmiley Dec 20, 2024 •

edited

Loading