Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for topic/partition in KeyRecordGrouper #167

Conversation

stephen-harris
Copy link
Contributor

@stephen-harris stephen-harris commented Jun 21, 2023

This adds support for including topic/partition in the file template when using KeyRecordGrouper.

The wider context is the desire to have a connector ingesting multiple compacted topics, and wanting to write records within the same topic and with the same key, to the same location

Fix #178

@stephen-harris stephen-harris requested review from a team as code owners June 21, 2023 11:57
Copy link
Contributor

@jeqo jeqo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephen-harris thanks for this PR!

I agree this is a good addition to the connectors that depend on this library.
Though, instead of modifying the existing KeyRecordGrouper -- and as there is a TopicPartitionRecordGrouper -- what do you think about having this requirement covered by a new record grouper (e.g. TopicPartitionAndKeyRecordGrouper)?

@stephen-harris
Copy link
Contributor Author

Hi @jeqo Sorry for the delay in getting back to you, I've been on annual leave.

I'm happy to go with that approach if you prefoer. I think it'll require some changes to the connector through (i.e. the new grouper will still have a 1 file limitation - but happy to make the necessary changes there when this library is updated.

@stephen-harris
Copy link
Contributor Author

I've refactored this, and added a further tests for the RecordGrouperFactor (my original PR actually omitted some necessary changes here).

Regarding the RecordGrouper, I'm happy with either approach. If you opt for the new TopicPartitionAndKeyRecordGrouper I'll make the necessary changes in the S3 connector when I write open PR for that.

Copy link
Contributor

@jeqo jeqo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @stephen-harris, looking good! Left some comments

@stephen-harris
Copy link
Contributor Author

Thanks @jeqo, I've made the requested changes.

One thing I did notice (and I think this can be resolved by a separate PR), but validateKeyFilenameTemplate, which was added here, isn't executed (nor is it invoked by the mentioned S3 & GCS connectors).

I'm not sure if we'd want to invoke that method in AivenCommonConfig::validate or whether it should be invoked in S3SinkConfig::validate. Happy to create separate PR to resolve that one way or the other.

@jeqo
Copy link
Contributor

jeqo commented Aug 8, 2023

@stephen-harris thanks! Good catch! Agree, it looks like either dead-code or we are missing to call it from the storage connector configs. Could you create an issue first to discuss this further?

Could you also rebase this PR to get the latest codeql workflow executed?

jeqo
jeqo previously approved these changes Aug 8, 2023
@stephen-harris
Copy link
Contributor Author

@jeqo Done :)

@jeqo
Copy link
Contributor

jeqo commented Aug 14, 2023

@stephen-harris thanks! But seems git history got a bit messed up. Could you take a look?

From my side, seems that if you move one step back from the last merge commit, it should be fine:

image

So, reset to the previous commit and force push. I tested it here and looks better: master...jeqo:commons-for-apache-kafka-connect:feat/support-topic-partition-in-key-record-group

@stephen-harris stephen-harris force-pushed the feat/support-topic-partition-in-key-record-group branch from e328c10 to f3b4356 Compare August 21, 2023 09:18
@stephen-harris
Copy link
Contributor Author

stephen-harris commented Aug 21, 2023

@jeqo Yup, I accidentally merged in the remote branch rather than force-pushing the rebased local branch. Fixed now.

@jeqo jeqo enabled auto-merge (squash) August 21, 2023 10:19
void keyOnly() {
final Template filenameTemplate = Template.of("{{key}}");
final String grType = RecordGrouperFactory.resolveRecordGrouperType(filenameTemplate);
assertEquals(RecordGrouperFactory.KEY_RECORD, grType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use AssertJ everywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @stephen-harris, sorry for hijacking the workflow but could you please rebase onto main and fix the assertions?

Copy link
Contributor

@jeqo jeqo Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks @AnatolyPopov, missed this in my review.
@stephen-harris If you rebase your PR, this will be validated by checkstyle based on #192

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AnatolyPopov @jeqo Rebased and updated to use AssertJ

@jeqo jeqo disabled auto-merge August 21, 2023 11:56
@stephen-harris stephen-harris force-pushed the feat/support-topic-partition-in-key-record-group branch from f3b4356 to bbcff43 Compare August 21, 2023 15:11
Copy link
Contributor

@jeqo jeqo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @stephen-harris

@jeqo jeqo merged commit e759da3 into Aiven-Open:main Aug 21, 2023
8 checks passed
@stephen-harris
Copy link
Contributor Author

Thanks @jeqo, are you able to publish the update and I'll get a PR together for the S3-connector.

@jeqo
Copy link
Contributor

jeqo commented Aug 22, 2023

@stephen-harris sure! will do a release later this week. Will ping you here once available

@jeqo
Copy link
Contributor

jeqo commented Aug 25, 2023

@stephen-harris see https://github.com/Aiven-Open/commons-for-apache-kafka-connect/releases/tag/v0.11.0 -- looking forward to more contributions! Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support {{topic}} variable with {{key}} in file template pattern
3 participants