Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][function] Support Record<?> as Function output type #16041

Merged
merged 2 commits into from
Jun 17, 2022

Conversation

cbornet
Copy link
Contributor

@cbornet cbornet commented Jun 13, 2022

Motivation

Currently, when a user wants to dynamically set the output topic, the message properties or change the output schema in a Function, the only possibility is to create a Function that returns Void , use the Context and manually create a message with Context::newOutputMessage. The TypedMessageBuilderPublish Function in pulsar-functions-api-examples shows how to do that.
This way of doing is not intuitive and it would be better to return a structure like Record that carries this info.

Modifications

This PR adds support for returning Record in a Function.

  • In JavaInstanceRunnable::sendOutputMessage, we check the type of the output object and if it's of type Record we create a TargetSinkRecord instead of a SinkRecord. TargetSinkRecord uses the info from the output record in the various Record methods that are used by the Sink when creating the output message.
  • When registering the Function, in getFunctionTypes, if the output type is a Record, we get the wrapped type as type for the Sink.
  • A utility method newOutputRecordBuilder is added to Context that returns a FunctionRecord builder initialized with the info from the source record. The builder methods can then be used to override these values as needed.
  • A RecordFunction is added in the Function examples to demonstrate the use of this feature
  • An integration test is added PulsarFunctionsJavaTest

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

Run PulsarFunctionsJavaTest:: testRecordFunctionTest

Does this pull request potentially affect one of the following parts:

yes

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (yes / no)

  • The public API: (yes / no)
    Yes.

    • adds Record<> as a possible return type for Functions
    • adds a newOutputRecordBuilder method to Context
  • The schema: (yes / no / don't know)

  • The default values of configurations: (yes / no)

  • The wire protocol: (yes / no)

  • The rest endpoints: (yes / no)

  • The admin cli options: (yes / no)

  • Anything that affects deployment: (yes / no / don't know)

Documentation

Check the box below or label this PR directly.

Need to update docs?

  • doc-required
    (Your PR needs to update docs and you will update later)

  • doc-not-needed
    (Please explain why)

  • doc
    (Your PR contains doc changes)

  • doc-complete
    (Docs have been already added)

@dave2wave dave2wave requested review from eolivelli and lhotari June 13, 2022 19:41
@Anonymitaet Anonymitaet added the doc-required Your PR changes impact docs and you will update later. label Jun 14, 2022
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Is this change ABI compatible so that existing user created Pulsar Functions implementations can be run without recompiling when upgrading to a version that includes this change?

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

I left some feedback

currentRecord.getPartitionIndex().ifPresent(builder::partitionIndex);
currentRecord.getRecordSequence().ifPresent(builder::recordSequence);

// TODO: add message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to do this, please create a ticket and link it, otherwise please remove this "TODO"
TODOs are usually some kind of code smell, especially in a big open source project like Pulsar.

we could explain why the "message" is not available here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot to remove that one.
I totally agree that TODOs must not hit main branch 😄

@@ -166,7 +166,7 @@ public void testSlidingCountWindowTest() throws Exception {
@Test(groups = {"java_function", "function"})
public void testMergeFunctionTest() throws Exception {
testMergeFunction();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove space

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to align with the rest of the function on 4 spaces.
But this whole class is misaligned...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realigned completely this block but I guess the full class should be realigned.
Maybe better to do it in another PR ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this PR should focus on the feature and not on code clean up

private final Integer partitionIndex;
private final Long recordSequence;

public static <T> FunctionRecord.FunctionRecordBuilder<T> from(Context context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need unit test coverage for this method (ensure that every field is properly handled)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -161,4 +162,11 @@ public interface Context extends BaseContext {
* @throws PulsarClientException
*/
<X> ConsumerBuilder<X> newConsumerBuilder(Schema<X> schema) throws PulsarClientException;

/**
* Create a FunctionRecordBuilder.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please explain a little bit how this is supposed to be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@cbornet
Copy link
Contributor Author

cbornet commented Jun 14, 2022

Is this change ABI compatible so that existing user created Pulsar Functions implementations can be run without recompiling when upgrading to a version that includes this change?

@lhotari I think it is since we only add a method to the Context API.
SinkRecord is sometimes used outside of the package (and probably shouldn't) in tests but I think extracting some methods to a super class is OK for binary compat. Or is it not ?
Do you see other things that could break ?

@cbornet cbornet requested a review from eolivelli June 14, 2022 14:34
* Rename TargetSinkRecord to OutputRecordSinkRecord
* Add UT for FunctionCommon::getFunctionTypes
* Add UT for Context::newOutputRecordBuilder
* Add some javadoc
* Always set message properties for OutputRecordSinkRecord
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codelipenghui codelipenghui added this to the 2.11.0 milestone Jun 15, 2022
@codelipenghui codelipenghui added area/function type/feature The PR added a new feature or issue requested a new feature labels Jun 15, 2022
@codelipenghui
Copy link
Contributor

@nlu90 Please help review this PR.

@cbornet
Copy link
Contributor Author

cbornet commented Jun 15, 2022

/pulsarbot rerun-failure-checks

Copy link
Contributor

@freeznet freeznet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I do have a question, it seems this method can be a replacement of newOutputMessage, so is it possible to deprecate newOutputMessage? if not, could you please add some more context comparing newOutputRecordBuilder and newOutputMessage? thanks.

* @param <T> type of Record to build
* @return a Record builder initialised with values from the Function Context
*/
public static <T> FunctionRecord.FunctionRecordBuilder<T> from(Context context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help me understand why the whole context is passed here, instead of the currentRecord?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set the destinationTopic from the Context getOutputTopic

@@ -481,6 +482,11 @@ public <T> ConsumerBuilder<T> newConsumerBuilder(Schema<T> schema) throws Pulsar
return this.client.newConsumer(schema);
}

@Override
public <X> FunctionRecord.FunctionRecordBuilder<X> newOutputRecordBuilder() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a new API here, do you think about creating a util class with a record generation method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to not add a general purpose API to build "Record" instances.
A Record is like a Message and you cannot build a Message using the client API or a utility class.
The record may be tied to some internal context or pre-configured (as it does currently in this PR).

So it is better to use Context as a starting point for building a new record, the same way it happens with newOutputMessage()

@cbornet
Copy link
Contributor Author

cbornet commented Jun 16, 2022

LGTM, but I do have a question, it seems this method can be a replacement of newOutputMessage, so is it possible to deprecate newOutputMessage? if not, could you please add some more context comparing newOutputRecordBuilder and newOutputMessage? thanks.

That's 2 different approaches. newOutputRecordBuilder is useful if you design a Function that returns Record. newOutputMessage is more generic and could be used to send multiple messages for instance. It also gives more control on the message as you can control things such as deliverAt/deliverAfter.

@eolivelli eolivelli merged commit b3c5191 into apache:master Jun 17, 2022
@cbornet cbornet deleted the function-record branch June 17, 2022 11:37
cbornet added a commit to cbornet/pulsar that referenced this pull request Jun 18, 2022
cbornet added a commit to cbornet/pulsar that referenced this pull request Jun 18, 2022
cbornet added a commit to datastax/pulsar that referenced this pull request Jun 20, 2022
@momo-jun
Copy link
Contributor

Hi @cbornet , will you update relevant docs in a follow-up PR?

@tisonkun
Copy link
Member

@cbornet @momo-jun is there any existing doc we can improve?

Perhaps we should place the RecordFunction demo under "Pulsar Functions :: How to: Develop" entry with a new page?

@Anonymitaet Anonymitaet added doc-complete Your PR changes impact docs and the related docs have been already added. and removed doc-required Your PR changes impact docs and you will update later. labels Aug 2, 2022
import org.apache.pulsar.functions.api.Context;
import org.apache.pulsar.functions.api.Record;

@Builder(builderMethodName = "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to suppress generating the builder method?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/function doc-complete Your PR changes impact docs and the related docs have been already added. type/feature The PR added a new feature or issue requested a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants