Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[issue-180] Provide method for deserialization with metadata #305

Merged
merged 12 commits into from
Feb 17, 2020

Conversation

crazyzhou
Copy link
Contributor

Signed-off-by: Brian Zhou [email protected]

Change log description

  • [issue-180] Provide method for deserialization with metadata

Purpose of the change
Fixes #180

What the code does
Provide a method deserializeWithMetadata for user to override that can leverage EventRead in the deserialization.
Added unit test for the case.

How to verify it
./gradlew clean build passes
Target to master, should cherry-pick to all 0.7 branch

@codecov
Copy link

codecov bot commented Jan 6, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@cce935d). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #305   +/-   ##
=========================================
  Coverage          ?   84.53%           
  Complexity        ?      345           
=========================================
  Files             ?       36           
  Lines             ?     1623           
  Branches          ?      177           
=========================================
  Hits              ?     1372           
  Misses            ?      139           
  Partials          ?      112
Impacted Files Coverage Δ Complexity Δ
...tors/flink/serialization/PravegaSerialization.java 0% <ø> (ø) 0 <0> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cce935d...0038211. Read the comment docs.

Copy link
Contributor

@EronWright EronWright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

*
* @return the deserialized event with metadata
*/
public T deserializeWithMetadata(EventRead<T> eventRead) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name suggests that it replaces deserialize which is not the case. This method does not actually deserialize. I recommending renaming it to insertMetadata or something similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that this one does not do deserialization, but it really "get" the data. It has a default implementation eventRead.getEvent() and gets called in the FlinkPravegaReader here: https://github.com/pravega/flink-connectors/pull/305/files#diff-005207bb16199a05351084fea79fd9acR271

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not a fan of the current name, but I don't think insertMetadata is correct because if it's not overriden which is most cases, it won't insert metadata. Maybe getData , getEvent is better? Any suggestions of a better name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reviewing this code I also got confused by the name of this method because it's not actually obvious that both deserialize methods will get called, one at the Pravega level and one at the Flink Connector level.

It's almost like the method should be called extractEvent since it's extracting it from the Pravega level API (and providing the chance to inject Metadata into the event). I considered perhaps it should be called getEvent or retrieveEvent but this would get confusing with the Pravega semantics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractEvent is a good one, I can buy in this name

@@ -29,6 +30,9 @@
* <p>This adapter exposes the Pravega serializer as a Flink Deserialization schema and
* exposes the produced type (TypeInformation) to allow Flink to configure its internal
* serialization and persistence stack.
*
* <p>An additional method {@link #deserializeWithMetadata(EventRead)} is provided for
* applying the metadata in the deserialization. This method can be overriden in the extended class. </p>
*/
public class PravegaDeserializationSchema<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type T here is overloaded in that it expresses both the type of object expected from the Pravega stream (used by the Pravega API) and the type of object expected from the connector (with possible Metadata injected). This means that the object being stored in the stream MUST serialize to a Java class that expects that Metadata. For some formats (such as JSON) this may be easy but for other formats such as Java it's difficult and messy. In fact this can be seen in the Unit tests where IntegerWithEventPointer is the type being "stored" in the stream.

I'm not sure if it's possible, but it feels like there should be two types, a type T to represent the type from the stream and a type C to represent the type from the connector. In most cases these could be the same but in other cases where MetaData needs to be injected these could be different.

Or maybe (just thinking allowed) we're talking about two completely different interfaces, i.e. a PravegaMetaDataSchema<T, C> that extends PravegaDeserializationSchema and just has the method

public C deserializeWithMetadata(EventRead<T> eventRead) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@maddisondavid maddisondavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@maddisondavid maddisondavid merged commit 726edd8 into pravega:master Feb 17, 2020
crazyzhou added a commit that referenced this pull request Feb 17, 2020
* [issue-180] Provide method for deserialization with metadata

Signed-off-by: Brian Zhou <[email protected]>
crazyzhou added a commit that referenced this pull request Feb 17, 2020
* [issue-180] Provide method for deserialization with metadata

Signed-off-by: Brian Zhou <[email protected]>
@crazyzhou crazyzhou deleted the issue-180 branch September 11, 2020 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose the EventRead metadata along with a value
5 participants