-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: add support for ingesting from rabbitmq super streams #14137
Feature: add support for ingesting from rabbitmq super streams #14137
Conversation
3828173
to
6c49ca7
Compare
@jamiechapmanbrn - let us know if you need any help from druid dev community in moving this PR forward. |
8743213
to
822a191
Compare
I've marked this as ready for review. We're testing it currently in our cluster and it seems to be working fine. I've added documentation and extended the tests significantly, I'm not 100% sure whether it has enough coverage to pass the CI yet or not, but I'm running out of steam to work on those. They are relatively tricky for me since I don't work much with Java and I'm not as familiar with the supervisor structures/classes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.
have you done any scale testing on this yet? I am wondering what kind of scale you have run it with so far. |
Currently, our org is running with it with a relatively low scale, somewhere on the order of a few events per minute. What would you consider reasonable for scale testing? |
@jamiechapmanbrn, I'll complete my review for this PR soon. |
68656d4
to
5b2bcca
Compare
Okay, I rebased on the latest master and updated the pom.xml file to reference the latest. I'll have a look at any test failures that come when they're available. |
f705005
to
fd210b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
fd210b7
to
9d4b9de
Compare
9d4b9de
to
24f5e2c
Compare
Thank you @jamiechapmanbrn. Overall approach LGTM! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please take a pass over the RabbitStreamRecordSupplier and RabbitStreamSupervisor classes to clean up minor things such as unneeded exceptions, unused arguments in newly added methods, and see if any public methods exist that can be made private?
...ng-service/src/main/java/org/apache/druid/indexing/kinesis/KinesisIndexTaskTuningConfig.java
Outdated
Show resolved
Hide resolved
...ng-service/src/main/java/org/apache/druid/indexing/kinesis/KinesisIndexTaskTuningConfig.java
Outdated
Show resolved
Hide resolved
...service/src/main/java/org/apache/druid/indexing/rabbitstream/RabbitStreamRecordSupplier.java
Outdated
Show resolved
Hide resolved
...service/src/main/java/org/apache/druid/indexing/rabbitstream/RabbitStreamRecordSupplier.java
Outdated
Show resolved
Hide resolved
...service/src/main/java/org/apache/druid/indexing/rabbitstream/RabbitStreamRecordSupplier.java
Outdated
Show resolved
Hide resolved
...service/src/main/java/org/apache/druid/indexing/rabbitstream/RabbitStreamRecordSupplier.java
Outdated
Show resolved
Hide resolved
d706e3d
to
36d9910
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
+1 after the conflicts and build failures are fixed.
36d9910
to
817d229
Compare
817d229
to
95ac4e5
Compare
That's great to hear! I've rebased once again, and added additional test coverage. It should pass now. I'll take another look once the CI has done its magic. |
95ac4e5
to
71a1706
Compare
I think I've fixed the pipeline issues.
|
971fdf0
to
ff7beb7
Compare
ff7beb7
to
d1ec691
Compare
@jamiechapmanbrn Thank you! |
That's great! Thanks for all the help! |
This PR aims to add support for ingesting logs from a rabbitmq superstream, as a peer to Kafka or Kinesis. Since this supports scalable exactly once delivery, this provides a good gasket between RabbitMQ and Druid.
Currently, this is a WIP for a few reasons:
No support for username/password and some other features, I welcome suggestions here as I'm not exactly sure how those work yet and could use some guidance on how that plumbing works
The algorithm is pretty poor here, simply making a new rabbit consumer for each poll, it should be doing something similar to Kinesis where it fetches in the background, occasionally backing off if it gets too far ahead.
Testing is extremely bare bones, primarily because the design of the record supplier could change here, and design layout could change test layout here.
Documentation on how exactly to use it is missing. I've been testing it with the below supervisor spec, the primary configuration mechanism is the 'uri' which is used to configure how it connects to rabbitmq for reading metadata and for getting messages
The meat of the change is in RabbitStreamRecordSupplier, which has the components that interface with rabbitmq.
It uses two interfaces, one being the low level 'Client' interface to read the partitions from a super stream so that druid can distribute work, and the other being the 'Consumer' interface, the conventional interface which creates a thread that listens for messages for you, and calls back on the handler which puts the messages in queue.
The current interface simply manages a map of ConsumerBuilders as partitions are assigned, and then when it is polled for messages it will create consumers and wait for the timeout. Once the timeout has elapsed, it will close all the consumers, then collect any messages from the queue and return them. This should ensure no messages are lost, but probably isn't very efficient as it could be reconnecting more than necessary.
Release note
New: You can now ingest logs from RabbitMQ via super streams
This feature gives the ability to read directly from rabbitmq using the new super-streams feature. As super streams allows exactly-once delivery with full support for partitioning, it is now compatible with Druid's modern ingestion algorithm, without the downsides that the prior RabbitMQ firehose.
Note that this uses the RabbitMQ streams feature, and not a conventional exchange. You will need to make sure that your messages are in a super stream before consumption. For more information, see https://www.rabbitmq.com/streams.html.
In order to configure, create a new index task using the 'rabbit' type, and configure it's stream and URI to connect to the rabbitmq host.
Key changed/added classes in this PR
The following classes have been added to druid in this PR:
RabbitStreamSamplerSpec
RabbitStreamRecordSupplier
RabbitStreamIndexTaskTuningConfig
RabbitStreamIndexTaskModule
RabbitStreamIndexTaskIOConfig
RabbitStreamIndexTaskClientFactory
RabbitStreamIndexTask
RabbitStreamDataSourceMetadata
IncrementalPublishingRabbitStreamIndexTaskRunner
RabbitSupervisorTuningConfig
RabbitStreamSupervisorSpec
RabbitStreamSupervisorReportPayload
RabbitStreamSupervisorIOConfig
RabbitStreamSupervisorIngestionSpec
RabbitStreamSupervisor
This PR has: