Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove index_realtime and index_realtime_appenderator tasks #16602

Merged
merged 27 commits into from
Jun 25, 2024

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Jun 14, 2024

Description

#15717 + fixed up conflicts + ITUnionQueryTest using the new integration test framework and kafka

Release Notes

Druid index_realtime and index_realtime_appenderator tasks have been removed and can no longer be used to ingest data. Docs for index_realtime were completely removed in Druid 0.22 in #11134, and prior to that we have been advising that these tasks were obsolete for much longer. index_realtime_appenderator tasks appear to have never been documented. index_realtime was used to support Tranquility, which docs have been advising as unsupported and untested since 0.9.2.

gianm and others added 18 commits January 18, 2024 03:04
index_realtime tasks were removed from the documentation in apache#13107. Even
at that time, they weren't really documented per se— just mentioned. They
existed solely to support Tranquility, which is an obsolete ingestion
method that predates migration of Druid to ASF and is no longer being
maintained. Tranquility docs were also de-linked from the sidebars and
the other doc pages in apache#11134. Only a stub remains, so people with
links to the page can see that it's no longer recommended.

index_realtime_appenderator tasks existed in the code base, but were
never documented, nor as far as I am aware were they used for any purpose.

This patch removes both task types completely, as well as removes all
supporting code that was otherwise unused. It also updates the stub
doc for Tranquility to be firmer that it is not compatible. (Previously,
the stub doc said it wasn't recommended, and pointed out that it is
built against an ancient 0.9.2 version of Druid.)
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviving this, @clintropolis ! The changes look clean to me, we can proceed with merging this PR.

I have not reviewed the new IT very closely, but that can always be revisited later.

labels:
druid-int-test: "true"
ports:
- 9092:9092
- 9093:9093
- 9094:9094
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the port change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was having trouble getting the docker kafka with zookeeper working so i switched to using no zookeeper, which runs a controller on 9093, see https://github.com/bitnami/containers/blob/main/bitnami/kafka/README.md#using-a-docker-compose-file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for the clarification! Maybe we can include this info in the IT readme or somewhere in a follow up PR.

Comment on lines +164 to +168
emitter = new ServiceEmitter(
"test",
"test",
new NoopEmitter()
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
emitter = new ServiceEmitter(
"test",
"test",
new NoopEmitter()
);
emitter = new StubServiceEmitter();

@clintropolis clintropolis merged commit 37a50e6 into apache:master Jun 25, 2024
89 checks passed
@clintropolis clintropolis deleted the realtime-rampage branch June 25, 2024 03:13
clintropolis added a commit to clintropolis/druid that referenced this pull request Jun 25, 2024
changes:
* `FireHydrant` is now `PartialSegment`. This name much more clearly describes what this class does, and with all the other fireman terminology removed it didn't even fit a theme anymore.
* `Sink` is now `AppendableSegment`. This name also much more clearly describes what this class does, and is composed of `PartialSegments` per the previous `FireHydrant` rename.
* Additionally, `SinkQuerySegmentWalker` -> `AppendableSegmentQuerySegmentWalker`, and `SinkQueryRunner` -> `AppendableSegmentQueryRunner`
* Remove `Firehose`, `IngestSegmentFirehose` was only used by Hadoop indexing `DruidRecordReader`, moved to internal class of `DruidRecordReader` as `SegmentReader`
* Remove `FirehoseFactory` and remaining implementations, after apache#16602 they were no longer used
clintropolis added a commit to clintropolis/druid that referenced this pull request Jun 25, 2024
changes:
* `FireHydrant` is now `PartialSegment`. This name much more clearly describes what this class does, and with all the other fireman terminology removed it didn't even fit a theme anymore.
* `Sink` is now `AppendableSegment`. This name also much more clearly describes what this class does, and is composed of `PartialSegments` per the previous `FireHydrant` rename.
* Additionally, `SinkQuerySegmentWalker` -> `AppendableSegmentQuerySegmentWalker`, and `SinkQueryRunner` -> `AppendableSegmentQueryRunner`
* Remove `Firehose`, `IngestSegmentFirehose` was only used by Hadoop indexing `DruidRecordReader`, moved to internal class of `DruidRecordReader` as `SegmentReader`
* Remove `FirehoseFactory` and remaining implementations, after apache#16602 they were no longer used
* Moved things from `org.apache.druid.segment.realtime.sink` and `org.apache.druid.segment.realtime.firehose` up to `org.apache.druid.segment.realtime`.
clintropolis added a commit to clintropolis/druid that referenced this pull request Jun 25, 2024
changes:
* `FireHydrant` is now `PartialSegment`. This name much more clearly describes what this class does, and with all the other fireman terminology removed it didn't even fit a theme anymore.
* `Sink` is now `AppendableSegment`. This name also much more clearly describes what this class does, and is composed of `PartialSegments` per the previous `FireHydrant` rename.
* Additionally, `SinkQuerySegmentWalker` -> `AppendableSegmentQuerySegmentWalker`, and `SinkQueryRunner` -> `AppendableSegmentQueryRunner`
* Remove `Firehose`, `IngestSegmentFirehose` was only used by Hadoop indexing `DruidRecordReader`, moved to internal class of `DruidRecordReader` as `SegmentReader`
* Remove `FirehoseFactory` and remaining implementations, after apache#16602 they were no longer used
* Moved things from `org.apache.druid.segment.realtime.sink` and `org.apache.druid.segment.realtime.firehose` up to `org.apache.druid.segment.realtime`.
@clintropolis clintropolis mentioned this pull request Jun 25, 2024
9 tasks
clintropolis added a commit to clintropolis/druid that referenced this pull request Jul 19, 2024
changes:
* removed `Firehose` and `FirehoseFactory` and remaining implementations which were mostly no longer used after apache#16602
* Moved `IngestSegmentFirehose` which was still used internally by Hadoop ingestion to `DatasourceRecordReader.SegmentReader`
* Rename `SQLFirehoseFactoryDatabaseConnector` to `SQLInputSourceDatabaseConnector` and similar renames for sub-classes
* Moved anything remaining in a 'firehose' package somewhere else
* Clean up docs on firehose stuff
clintropolis added a commit that referenced this pull request Jul 19, 2024
changes:
* removed `Firehose` and `FirehoseFactory` and remaining implementations which were mostly no longer used after #16602
* Moved `IngestSegmentFirehose` which was still used internally by Hadoop ingestion to `DatasourceRecordReader.SegmentReader`
* Rename `SQLFirehoseFactoryDatabaseConnector` to `SQLInputSourceDatabaseConnector` and similar renames for sub-classes
* Moved anything remaining in a 'firehose' package somewhere else
* Clean up docs on firehose stuff
sreemanamala pushed a commit to sreemanamala/druid that referenced this pull request Aug 6, 2024
changes:
* removed `Firehose` and `FirehoseFactory` and remaining implementations which were mostly no longer used after apache#16602
* Moved `IngestSegmentFirehose` which was still used internally by Hadoop ingestion to `DatasourceRecordReader.SegmentReader`
* Rename `SQLFirehoseFactoryDatabaseConnector` to `SQLInputSourceDatabaseConnector` and similar renames for sub-classes
* Moved anything remaining in a 'firehose' package somewhere else
* Clean up docs on firehose stuff
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants