Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Add JsonMetadataDomain and RowTrackingMetadataDomain #3893

Open
wants to merge 36 commits into
base: master
Choose a base branch
from

Conversation

qiyuandong-db
Copy link

@qiyuandong-db qiyuandong-db commented Nov 21, 2024

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

This PR builds on the base changes for introducing DomainMetadata into the Kernel, which are not yet merged. For changes specific to this PR, please refer to the last commit only.

This PR adds the following to Delta Kernel Java:

  • JsonMetadataDomain.java: Introduces the base abstract class JsonMetadataDomain for metadata domains that use JSON as their configuration string. Concrete implementations, such as RowTrackingMetadataDomain, should extend this class to define their specific metadata domain behavior. This class provides

    • capabilities for serialize to/deserialize from the configuration string
    • construct a DomainMetadata action for committing
    • extract a specific metadata domain instance from a snapshot
  • RowTrackingMetadataDomain.java: Implements the metadata domain delta.rowTracking. It manages row ID high water mark, which will be used in the future for assigning fresh row IDs.

How was this patch tested?

Added tests covering the functionalities of JsonMetadataDomain and RowTrackingMetadataDomain in DomainMetadataSuite.scala.

Does this PR introduce any user-facing changes?

No.

qiyuandong-db and others added 30 commits October 31, 2024 14:40
Co-authored-by: Johan Lasperas <[email protected]>
Co-authored-by: Johan Lasperas <[email protected]>
…actDomainMetadataMap to fillDomainMetadataMap.
Copy link
Collaborator

@johanl-db johanl-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick pass, I don't have any real concerns regarding this change.

Comment on lines +49 to +50
/** Default value for row ID high watermark when it is missing in the table */
public static final long MISSING_ROW_ID_HIGH_WATERMARK = -1L;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used, add it in the following change for populating row ids

Comment on lines +27 to +45
/**
* Creates an instance of {@link RowTrackingMetadataDomain} from a JSON configuration string.
*
* @param json the JSON configuration string
* @return an instance of {@link RowTrackingMetadataDomain}
*/
public static RowTrackingMetadataDomain fromJsonConfiguration(String json) {
return JsonMetadataDomain.fromJsonConfiguration(json, RowTrackingMetadataDomain.class);
}

/**
* Creates an instance of {@link RowTrackingMetadataDomain} from a {@link SnapshotImpl}.
*
* @param snapshot the snapshot instance
* @return an {@link Optional} containing the {@link RowTrackingMetadataDomain} if present
*/
public static Optional<RowTrackingMetadataDomain> fromSnapshot(SnapshotImpl snapshot) {
return JsonMetadataDomain.fromSnapshot(snapshot, RowTrackingMetadataDomain.class, DOMAIN_NAME);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move these either below the constructor or at the end of the class

* RowTrackingMetadataDomain}, should extend this class to define a specific metadata domain.
*/
public abstract class JsonMetadataDomain {
private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to review the default configuration used for serializing/deserializing.
E.g. does it correctly fail if the json string has missing fields / extra fields

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants