KAFKA-14462; [18/N] Add GroupCoordinatorService #13812

dajac · 2023-06-05T14:27:37Z

This patch introduces the GroupCoordinatorService. This is the new implementation of the group coordinator based on the coordinator runtime introduced in #13795.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

jolshan · 2023-06-05T21:53:22Z

@dajac Can you share the names of the new files while this contains all the other changes? Is it just GroupCoordinatorService.java (+ tests)

dajac · 2023-06-06T06:54:28Z

@jolshan The new files are:

GroupCoordinatorService.java + tests
GroupCoordinatorConfig.java + tests
ReplicatedGroupCoordinator.java + tests

jolshan · 2023-06-08T00:30:20Z

...coordinator/src/main/java/org/apache/kafka/coordinator/group/ReplicatedGroupCoordinator.java

+ * 2) The replay methods which apply records to the hard state. Those are used in the request
+ *    handling as well as during the initial loading of the records from the partitions.
+ */
+public class ReplicatedGroupCoordinator implements Coordinator<Record> {


Is this called replicated because we replicate the state? (In other words, this is the implementation to get the hard state we already have for the current group coordinator)

Right. This is where the state is stored. I don't really like the name but I could not come up with a better one. I am opened to suggestions here.

I guess it depends on what the other implementations of coordinator will be 😅

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupCoordinatorService.java

divijvaidya · 2023-06-08T15:47:39Z

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupCoordinatorConfig.java

+ */
+public class GroupCoordinatorConfig {
+    public static class Builder {
+        private int numThreads = 1;


Can we please move these to DEFAULT constants? and use ConfigDef?

Please see RemoteLogManagerConfig for inspiration.

Interesting pattern used in RemoteLogManagerConfig. As we define the properties in the ConfigDef in that class and not in the ConfigDef used in KafkaConfig, all the remote storage properties are not documented in the documentation. The documentation is generated based on the ConfigDef in KafkaConfig. I suppose that we missed this... We use a similar pattern for RaftConfig but in this case we defined the properties in two places. I am not a fan of this because it is error prone. It would be better if we could somehow add a ConfigDef to another ConfigDef to make this automatic and transparent. I will play a bit with this...

In this case, all the properties of the group coordinator and their default values are already defined in KafkaConfig and I just wanted to have a container to move them around in the java module.

However, I agree with the constants part of your comment.

I have played a bit around with this idea. It create a pretty big diff so I have decided to tackle this separately from this one. I filed https://issues.apache.org/jira/browse/KAFKA-15089 for this purpose.

For this patch, I have reduced GroupCoordinatorConfig to a simple POJO for now.

I was also wondering about how config defs worked with documentation. Thanks for filling this JIRA.

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupCoordinatorService.java

jolshan · 2023-06-14T22:18:18Z

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupCoordinatorService.java

+    }
+
+    /**
+     * @return TODO


nit: todo here

Oops. Fixed.

jolshan · 2023-06-14T22:42:03Z

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupCoordinatorService.java

+            topicPartitionFor(request.groupId()),
+            coordinator -> coordinator.consumerGroupHeartbeat(context, request)
+        ).exceptionally(exception -> {
+            if (exception instanceof UnknownTopicOrPartitionException) {


have we always converted these errors as such? I see in GroupMetadataManager things are slightly different.

Yes. We have similar logic in the scala code here. The main difference is that we have to do it at a different place now.

I saw this code, but it seems like we handle it different right?

notenoughreplicas moved to not coordinator when it used to be coordinator not available for example.

That seems to be a mistake. I will check this tomorrow.

jolshan · 2023-06-14T22:48:51Z

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupCoordinatorService.java

+        Iterable<TopicPartition> partitions,
+        TransactionResult transactionResult
+    ) {
+        throwIfNotActive();


do we plan to add more to these?

What do you mean? Different ones?

More to the methods

Right now we just throw if not active. Just curious if this is all we plan to do here.

Yeah, we have to implement of these methods. We have JIRAs for all of them...

got it. thanks for clarifying. maybe we don't need a todo or anything if we at least have the jira.
Some of the other methods had "not yet implemented" in the body so I wasn't sure about the ones that didn't.

jolshan · 2023-06-14T23:55:58Z

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupCoordinatorService.java

+     * The number of partitions of the __consumer_offsets topics. This is provided
+     * when the component is started.
+     */
+    private volatile int numPartitions = -1;


just to confirm this is usually just OffsetsTopicPartitionsProp, but we set to 1 in tests etc?

The __consumer_offsets is created based on OffsetsTopicPartitionsProp. However, numPartitions is gotten from the metadata cache when the coordinator is started. If the topic was manually created, it could have a different number of partitions.

It should only be done manually via the tests right?

numPartitions is set in startup all the time.

Sorry, I mean the topic should only be manually created via tests. Usually it would be done automatically when we use consumers.

jolshan · 2023-06-15T00:24:00Z

...oordinator/src/test/java/org/apache/kafka/coordinator/group/GroupCoordinatorServiceTest.java

+
+        service.startup(() -> 10);
+
+        assertTrue(service.partitionFor("foo") >= 0);


why don't we just use the hashing algorithm to get the actual partition?

That makes sense. Let me change this.

jolshan · 2023-06-15T00:26:26Z

...dinator/src/test/java/org/apache/kafka/coordinator/group/ReplicatedGroupCoordinatorTest.java

+            request
+        )).thenReturn(result);
+
+        assertEquals(result, coordinator.consumerGroupHeartbeat(context, request));


Is this just a test that we don't throw errors?

Yeah, it just validates that a successful response is just returned to the caller.

jolshan · 2023-06-15T00:30:11Z

...oordinator/src/test/java/org/apache/kafka/coordinator/group/GroupCoordinatorServiceTest.java

+    }
+
+    @Test
+    public void testOnElection() {


did we want to include a test that confirms we throw errors when not active?

I added an assertion for this in this test. I did the same for testOnResignation.

jolshan · 2023-06-21T16:51:03Z

Lots of connect failures when trying to shut down the brokers, let's see if this new run is cleaner.

jolshan

let's make sure the tests look ok before merging.

dajac · 2023-06-22T07:05:02Z

Failed tests are not related:

Build / JDK 17 and Scala 2.13 / testOffsetTranslationBehindReplicationFlow() – org.apache.kafka.connect.mirror.integration.IdentityReplicationIntegrationTest
1m 50s
Build / JDK 8 and Scala 2.12 / testBumpTransactionalEpoch(String).quorum=kraft – kafka.api.TransactionsTest
1m 14s
Build / JDK 11 and Scala 2.13 / testBalancePartitionLeaders() – org.apache.kafka.controller.QuorumControllerTest

dajac added the KIP-848 The Next Generation of the Consumer Rebalance Protocol label Jun 5, 2023

jeffkbkim mentioned this pull request Jun 5, 2023

KAFKA-14462; [14/N] Add PartitionWriter #13675

Merged

3 tasks

dajac force-pushed the KAFKA-14462-18 branch from 49bec93 to 03d1208 Compare June 6, 2023 06:51

dajac force-pushed the KAFKA-14462-18 branch 2 times, most recently from 01f4cf5 to 8a1968e Compare June 6, 2023 14:28

jolshan reviewed Jun 8, 2023

View reviewed changes

divijvaidya reviewed Jun 8, 2023

View reviewed changes

dajac added 2 commits June 14, 2023 09:44

KAFKA-14462; [18/N] Add GroupCoordinatorService

a6d5e1d

address minor comments

4c919be

dajac force-pushed the KAFKA-14462-18 branch from 8a1968e to 4c919be Compare June 14, 2023 13:08

dajac marked this pull request as ready for review June 14, 2023 13:13

jolshan reviewed Jun 14, 2023

View reviewed changes

jolshan reviewed Jun 15, 2023

View reviewed changes

address minor comments

c0d3ea3

jeffkbkim mentioned this pull request Jun 16, 2023

KAFKA-14500; [5/N] Implement JoinGroup protocol in new GroupCoordinator #13870

Merged

3 tasks

dajac added 2 commits June 19, 2023 08:44

Merge remote-tracking branch 'upstream/trunk' into KAFKA-14462-18

50451c9

fixes

706da47

jolshan approved these changes Jun 21, 2023

View reviewed changes

dajac merged commit a81486e into apache:trunk Jun 22, 2023

dajac deleted the KAFKA-14462-18 branch June 22, 2023 07:06


		service.startup(() -> 10);

		assertTrue(service.partitionFor("foo") >= 0);

KAFKA-14462; [18/N] Add GroupCoordinatorService #13812

KAFKA-14462; [18/N] Add GroupCoordinatorService #13812

Conversation

dajac commented Jun 5, 2023 • edited Loading

Committer Checklist (excluded from commit message)

jolshan commented Jun 5, 2023

dajac commented Jun 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jolshan commented Jun 21, 2023

jolshan left a comment

Choose a reason for hiding this comment

dajac commented Jun 22, 2023

dajac commented Jun 5, 2023 •

edited

Loading