Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(connector): Add topic to mqtt additional columns #19017

Merged

Conversation

Boudewijn26
Copy link
Contributor

@Boudewijn26 Boudewijn26 commented Oct 18, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

  • It's now possible to include the topic on which MQTT messages have been received by using INCLUDE topic in your SELECT

@Boudewijn26 Boudewijn26 force-pushed the feature/mqtt-topic-column branch 2 times, most recently from a85a7b8 to a0aed41 Compare October 21, 2024 07:08
@Boudewijn26 Boudewijn26 force-pushed the feature/mqtt-topic-column branch from a0aed41 to 31ab0a5 Compare October 21, 2024 07:12
@graphite-app graphite-app bot requested a review from a team October 21, 2024 07:54
@@ -87,6 +87,10 @@ pub static COMPATIBLE_ADDITIONAL_COLUMNS: LazyLock<HashMap<&'static str, HashSet
"collection_name",
]),
),
(
MQTT_CONNECTOR,
HashSet::from(["topic", "offset", "partition"]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, topic and partition are the same thing in the MQTT connector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I followed the pattern established by the file connectors, which have "file" the same as the "partition" and read the split_id as the file

https://github.com/risingwavelabs/risingwave/blob/main/src/connector/src/parser/mod.rs#L484-L491

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use partition instead. For other connectors, split_id is the partition number, no need for another additional column.

return Ok(A::output_for(
self.row_meta
.as_ref()
.map(|ele| ScalarRefImpl::Utf8(ele.split_id)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if subscribing to foo.*? I think the topic info should come from each message, you can take the kafka source impl as example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +123 to +133
loop {
if self.connected.load(std::sync::atomic::Ordering::Relaxed) {
break;
}

if start.elapsed().as_secs() > 10 {
bail!("Failed to connect to mqtt broker");
}

tokio::time::sleep(std::time::Duration::from_millis(500)).await;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding the comments #18961 (comment), I still have some concern about the wait time. Would you be able to move the logic in the tokio thread into this func?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand exactly what you're suggesting, but I'd say it's one of:

  1. Move the logic inside the tokio thread to the list_splits, spawn the thread in new
  2. During the call to list_splits spawn a thread and then stop that thread when the topics have come in
  3. During the call to list_splits spawn a new thread to listen to the incoming mqtt topics
  4. During the call to list_splits spawn a new thread if one isn't already running to listen to the incoming mqtt topics

Option 3 would spawn a new thread on each enumerator tick, which doesn't seem desirable to me
Option 4 is functionally identical to what we have just with added state to track the existence of the thread; I don't really see what would be gained
Option 2 would risk missing topics that come in after list_splits returns and come in before the next tick
Option 1 would require some inter thread communication in order to poll the mqtt eventloop inside the spawned thread whilst having the logic in the list_splits. That seems to make it more complicated than desired

Please indicate if I've misunderstood

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option 2 would risk missing topics that come in after list_splits returns and come in before the next tick

Oh, I got it. Mqtt does not have persisted metadata so RisingWave needs a thread running in the background. The fix makes sense.

Copy link
Contributor

@tabVersion tabVersion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reuse partition for topic in MQTT connector. The concept is hard to align with other connectors.

LGTM, thanks for your contribution.

Boudewijn26 added a commit to foundation-zero/risingwave-docs that referenced this pull request Oct 22, 2024
@Boudewijn26
Copy link
Contributor Author

@tabVersion With ae9f988 (#19017) I've modified this so that it uses the partition instead of a separate topic column.

Meanwhile foundation-zero/risingwave-docs@8d2b58f is ready for a pull request once this is merged

Copy link
Member

@yufansong yufansong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tabVersion
Copy link
Contributor

Let's merge

@tabVersion tabVersion added this pull request to the merge queue Oct 29, 2024
Merged via the queue into risingwavelabs:main with commit 661939a Oct 29, 2024
34 of 35 checks passed
@Boudewijn26 Boudewijn26 deleted the feature/mqtt-topic-column branch October 31, 2024 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants