-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrieve topic/partition storage information using Admin describeLogDirs
#42
Conversation
Signed-off-by: Michael Edgar <[email protected]>
57ebc58
to
5f5ae01
Compare
Signed-off-by: Michael Edgar <[email protected]>
Signed-off-by: Michael Edgar <[email protected]>
api/src/main/java/com/github/eyefloaters/console/api/model/Node.java
Outdated
Show resolved
Hide resolved
api/src/main/java/com/github/eyefloaters/console/api/model/Node.java
Outdated
Show resolved
Hide resolved
api/src/main/java/com/github/eyefloaters/console/api/model/PartitionInfo.java
Outdated
Show resolved
Hide resolved
api/src/main/java/com/github/eyefloaters/console/api/model/ReplicaInfo.java
Outdated
Show resolved
Hide resolved
private final String host; | ||
private final int port; | ||
private final String rack; | ||
private Either<ReplicaInfo, Error> log; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A node hosts many replicas, so why is there just a single one here? TBH it's not clear to me why it's better to model the relationship between nodes and replicas on the Node, rather than on the Partition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes back to my earlier comment about the Node
model. Here, a Node
represents the placement of a partition replica. Thus, the log
is the storage used by that partition's replica on the node. As you're pointing out, this is not clear with the way it's modeled currently - or at least the way the models are named.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did wonder about how many node instances there were floating about for a given node id! If you wanted to stick with this way of doing it then we should change the name Node
→ReplicaNode
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose we do something like this:
// replaces `Node` in the context of describing a replica
class Replica {
int nodeId;
String nodeRack;
boolean leader;
boolean inSync;
// localStorage? Depends on if/how we can get tiered storage info
Either<ReplicaStorage, Error> storage;
}
// replaces `ReplicaInfo` (too generic?)
record ReplicaStorage(
long localSize,
long offsetLag,
boolean future
) {
}
The partition model could then simply have a List
of Replica
s, and get rid of leader
and isr
as separate fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep int leaderId
in the partition model, then you could use List<Replica> replicas
, in the same order as returned from the Admin client (thus preserving semantic of the first replica being the preferred leader). Of course you can encapsulate the lookup of the leader (on the partition model) via its id in a method so the caller doesn't have to know there isn't a Replica leader
field.
I'd also wonder about inlining the fields of ReplicaStorage
into Replica
with a separate Error storageError
. Doing it as you suggest means the structure of the objects reflects how the information was retrieved (i.e. it's only because we use a separate Admin method for getting the storage into that we have a separate class, and a indirection through an Either
), which feels pretty invasive. It's one thing to do that in your Java code, but when it leaks out into the JSON structure exposed by the API it starts to feel like we're exposing internal details of the implementation. At that point it might be cleaner if all the fields were nullable and each class (Replica
etc) had a single Error
which was non-null whenever there was any problem. But that comes with it's own downsides for writing the code in a functional way. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think there's a benefit to placing things in the Either
, especially when it comes to a set of fields that may be inaccessible due to permissions or some combination of ACLs preventing the retrieval. Granted it does expose some info about the backend, but ultimately it does come down to the API operations that Kafka exposes, regardless of any server's implementation.
If we later add some additional remoteStorage
for Tiered Storage information, it compartmentalizes that storage info or the reason it is not present neatly beneath a single field in the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the other thread, a new object PartitionReplica
was added to carry the replica nodeId
and nodeRack
information, plus a boolean derived from the isr
array. This embeds an Either<ReplicaLocalStorage, Error>
for node-local storage of the log. I'll leave this thread open in case the discussion around how/whether to use Either
like this should continue.
api/src/main/java/com/github/eyefloaters/console/api/model/PartitionInfo.java
Outdated
Show resolved
Hide resolved
api/src/main/java/com/github/eyefloaters/console/api/model/PartitionInfo.java
Outdated
Show resolved
Hide resolved
api/src/main/java/com/github/eyefloaters/console/api/model/Topic.java
Outdated
Show resolved
Hide resolved
api/src/main/java/com/github/eyefloaters/console/api/service/TopicService.java
Outdated
Show resolved
Hide resolved
.toList(); | ||
.map(partition -> { | ||
var key = new TopicPartition(topic.getName(), partition.getPartition()); | ||
List<Integer> value = partition.getReplicas().stream().map(Node::getId).toList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should partition have a getReplicaNodeIds()
to encapsulate this?
I have a UX question about the screenshots... the 'Replicas' and 'In-sync Replicas' columns are counts, not node ids, right? I wonder if the 'In-sync Replicas' being a count just puts cognitive load on the user. They're basically having to compare it to the 'Replicas' column to understand what it means. It's not such a problem if RF=3 uniformly, but that's not always the case:
I don't know enough about the overall UX design to really understand how problematic this might be. For example how easy it is to see if there's an ongoing reassignment that's affecting a particular partition. Or if we had plans to highlight the count of insync replicas if it was unexpectedly less than replicas. |
I've seen other UIs display the node IDs for replicas/ISRs, I'm wondering if we simply display the list of node IDs for the replicas, and somehow indicate the leader (first in the list?) and then differentiate in-sync and not in-sync for the others. For the scenario you describe with a reassignment, perhaps there are additional |
From Kafka's PoV I did wonder whether, rather than "in sync replicas" we showed a list of 2out of sync replicas" (i.e. just
Yes, |
I loved reading all the comments, there is a lot of knowledge there. But it left me wondering if we need to expose a formally correct view of how Kafka works through our APIs at all. What if we decide beforehand what is it exactly that we want to show on the console, and have the API expose just that? It means aggregating data, simplify some concepts, put our own spices to the "Kafka console recipe". WDYT? |
I'm looking through the model and thinking about that too. Ideally we can keep the model as DRY as possible and we'll need a way to communicate errors at a granular level + request additional data (which is supported currently with the @tombentley , I played around a bit with the replica display based on your latest comment and here's one possible view. Node 1 was stopped prior to publishing a batch of records. This shows both the out-of-sync replicas being populated as well as the leader Id being different from the preferred leader (first in list) for several partitions. Response JSON{
"data": {
"id": "xdgBIiIiREyDGmmFrJisSA",
"type": "topics",
"attributes": {
"name": "t2",
"internal": false,
"partitions": [
{
"partition": 0,
"replicas": [
{
"nodeId": 0,
"inSync": true,
"localStorage": {
"size": 0,
"offsetLag": 0,
"future": false
}
},
{
"nodeId": 1,
"inSync": false,
"localStorage": {
"meta": {
"type": "error"
},
"title": "Unable to fetch replica log metadata",
"detail": "Timed out waiting for a node assignment. Call: describeLogDirs"
}
},
{
"nodeId": 2,
"inSync": true,
"localStorage": {
"size": 0,
"offsetLag": 0,
"future": false
}
}
],
"leaderId": 0,
"offsets": {
"earliest": {
"offset": 0,
"leaderEpoch": 1
},
"latest": {
"offset": 0,
"leaderEpoch": 1
},
"maxTimestamp": {
"offset": -1
}
},
"recordCount": 0,
"leaderLocalStorage": 0
},
{
"partition": 1,
"replicas": [
{
"nodeId": 2,
"inSync": true,
"localStorage": {
"size": 0,
"offsetLag": 0,
"future": false
}
},
{
"nodeId": 0,
"inSync": true,
"localStorage": {
"size": 0,
"offsetLag": 0,
"future": false
}
},
{
"nodeId": 1,
"inSync": false,
"localStorage": {
"meta": {
"type": "error"
},
"title": "Unable to fetch replica log metadata",
"detail": "Timed out waiting for a node assignment. Call: describeLogDirs"
}
}
],
"leaderId": 2,
"offsets": {
"earliest": {
"offset": 0,
"leaderEpoch": 1
},
"latest": {
"offset": 0,
"leaderEpoch": 1
},
"maxTimestamp": {
"offset": -1
}
},
"recordCount": 0,
"leaderLocalStorage": 0
},
{
"partition": 2,
"replicas": [
{
"nodeId": 1,
"inSync": false,
"localStorage": {
"meta": {
"type": "error"
},
"title": "Unable to fetch replica log metadata",
"detail": "Timed out waiting for a node assignment. Call: describeLogDirs"
}
},
{
"nodeId": 2,
"inSync": true,
"localStorage": {
"size": 0,
"offsetLag": 0,
"future": false
}
},
{
"nodeId": 0,
"inSync": true,
"localStorage": {
"size": 0,
"offsetLag": 0,
"future": false
}
}
],
"leaderId": 2,
"offsets": {
"earliest": {
"offset": 0,
"leaderEpoch": 1
},
"latest": {
"offset": 0,
"leaderEpoch": 1
},
"maxTimestamp": {
"offset": -1
}
},
"recordCount": 0,
"leaderLocalStorage": 0
},
{
"partition": 3,
"replicas": [
{
"nodeId": 0,
"inSync": true,
"localStorage": {
"size": 360,
"offsetLag": 0,
"future": false
}
},
{
"nodeId": 2,
"inSync": true,
"localStorage": {
"size": 360,
"offsetLag": 0,
"future": false
}
},
{
"nodeId": 1,
"inSync": false,
"localStorage": {
"meta": {
"type": "error"
},
"title": "Unable to fetch replica log metadata",
"detail": "Timed out waiting for a node assignment. Call: describeLogDirs"
}
}
],
"leaderId": 0,
"offsets": {
"earliest": {
"offset": 0,
"leaderEpoch": 1
},
"latest": {
"offset": 32,
"leaderEpoch": 1
},
"maxTimestamp": {
"offset": 0,
"timestamp": "2023-09-22T16:31:51.871Z",
"leaderEpoch": 1
}
},
"recordCount": 32,
"leaderLocalStorage": 360
},
{
"partition": 4,
"replicas": [
{
"nodeId": 2,
"inSync": true,
"localStorage": {
"size": 0,
"offsetLag": 0,
"future": false
}
},
{
"nodeId": 1,
"inSync": false,
"localStorage": {
"meta": {
"type": "error"
},
"title": "Unable to fetch replica log metadata",
"detail": "Timed out waiting for a node assignment. Call: describeLogDirs"
}
},
{
"nodeId": 0,
"inSync": true,
"localStorage": {
"size": 0,
"offsetLag": 0,
"future": false
}
}
],
"leaderId": 2,
"offsets": {
"earliest": {
"offset": 0,
"leaderEpoch": 1
},
"latest": {
"offset": 0,
"leaderEpoch": 1
},
"maxTimestamp": {
"offset": -1
}
},
"recordCount": 0,
"leaderLocalStorage": 0
},
{
"partition": 5,
"replicas": [
{
"nodeId": 1,
"inSync": false,
"localStorage": {
"meta": {
"type": "error"
},
"title": "Unable to fetch replica log metadata",
"detail": "Timed out waiting for a node assignment. Call: describeLogDirs"
}
},
{
"nodeId": 0,
"inSync": true,
"localStorage": {
"size": 0,
"offsetLag": 0,
"future": false
}
},
{
"nodeId": 2,
"inSync": true,
"localStorage": {
"size": 0,
"offsetLag": 0,
"future": false
}
}
],
"leaderId": 0,
"offsets": {
"earliest": {
"offset": 0,
"leaderEpoch": 1
},
"latest": {
"offset": 0,
"leaderEpoch": 1
},
"maxTimestamp": {
"offset": -1
}
},
"recordCount": 0,
"leaderLocalStorage": 0
}
],
"authorizedOperations": [
"DELETE",
"ALTER_CONFIGS",
"WRITE",
"CREATE",
"DESCRIBE",
"DESCRIBE_CONFIGS",
"ALTER",
"READ"
],
"recordCount": 32,
"leaderLocalStorageTotal": 360
}
}
}
|
Signed-off-by: Michael Edgar <[email protected]>
f4c4575
to
afdfbb7
Compare
@tombentley , @riccardo-forina please take another look and let me know if the latest addresses your concerns (or raises new ones 😃) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
About the revised UI screenshot I think that's better but still not perfect.
|
api/src/main/java/com/github/eyefloaters/console/api/model/PartitionInfo.java
Outdated
Show resolved
Hide resolved
api/src/main/java/com/github/eyefloaters/console/api/model/PartitionInfo.java
Show resolved
Hide resolved
api/src/main/java/com/github/eyefloaters/console/api/model/ReplicaLocalStorage.java
Outdated
Show resolved
Hide resolved
api/src/main/java/com/github/eyefloaters/console/api/model/ReplicaLocalStorage.java
Outdated
Show resolved
Hide resolved
api/src/main/java/com/github/eyefloaters/console/api/model/ReplicaLocalStorage.java
Outdated
Show resolved
Hide resolved
api/src/main/java/com/github/eyefloaters/console/api/model/Topic.java
Outdated
Show resolved
Hide resolved
.thenApply(nothing -> promise.complete(result)) | ||
.exceptionally(promise::completeExceptionally); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are we getting out of these two calls? The final thenCompose()
already returns a CompletableFuture
, so these two calls just seem to be completing promise
with the same outcome as the one returned by thenCompose()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thenCompose
returns are CompletableFuture<Void>
whereas the method here returns CompletableFuture<Map<...>>
. I find it more readable for the meaningful calls to "align" then finish up separately. Each one is enriching the result
map, returning nothing, followed by the completion calls.
Aside - listOffsets
and describeLogDirs
could probably be called in parallel at some point.
current:
CompletableFuture.allOf(pendingDescribes)
.thenCompose(nothing -> listOffsets(adminClient, result, offsetSpec))
.thenCompose(nothing -> describeLogDirs(adminClient, result))
.thenApply(nothing -> promise.complete(result))
.exceptionally(promise::completeExceptionally);
alternate:
CompletableFuture.allOf(pendingDescribes)
.thenCompose(nothing -> listOffsets(adminClient, result, offsetSpec))
.thenCompose(nothing -> {
describeLogDirs(adminClient, result);
return result;
});
api/src/main/java/com/github/eyefloaters/console/api/service/TopicService.java
Show resolved
Hide resolved
api/src/main/java/com/github/eyefloaters/console/api/service/TopicService.java
Show resolved
Hide resolved
Co-authored-by: Tom Bentley <[email protected]>
Signed-off-by: Michael Edgar <[email protected]>
I agree it needs improvement. Let's use this version as the starting point for discussion and iterate from there. There will no doubt be other things to clear up the presentation as well. Unless you have any objection, I'd like to merge this one and start doing some more serious exploration around the metrics piece.
Fixed in the latest, it appears at |
Kudos, SonarCloud Quality Gate passed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Bumps [org.skyscreamer:jsonassert](https://github.com/skyscreamer/JSONassert) from 1.5.0 to 1.5.1. - [Release notes](https://github.com/skyscreamer/JSONassert/releases) - [Changelog](https://github.com/skyscreamer/JSONassert/blob/master/CHANGELOG.md) - [Commits](skyscreamer/JSONassert@jsonassert-1.5.0...jsonassert-1.5.1) --- updated-dependencies: - dependency-name: org.skyscreamer:jsonassert dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Topic summary storage:
Partition storage: