Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script: Ingest Metadata and CtxMap #88458

Merged
merged 32 commits into from
Jul 13, 2022

Conversation

stu-elastic
Copy link
Contributor

@stu-elastic stu-elastic commented Jul 12, 2022

Create a Metadata superclass for ingest and update contexts.

Create a CtxMap superclass for ctx backwards compatibility in ingest and update contexts. script.CtxMap was moved from ingest.IngestSourceAndMetadata

CtxMap takes a Metadata subclass and validates update via the FieldPropertys passed in.

Metadata provides typed getters and setters and implements a Map-like interface, making it easy for a class containing CtxMap to implement the full Map interface.

The FieldProperty record that configures how to validate fields. Fields have a type, are writeable or read-only, and nullable or not and may have an additional validation useful for Set/Enum validation.

…return vt validator, doc matcher only checks metadata map
Adds FieldProperties record that configures how to validate
fields.  Fields have a type, are writeable or read-only, and nullable
or not and may have an additional validation useful for Set/Enum
validation.

Splits IngestMetadata from Metadata in preparation for new Metdata
subclasses.
@stu-elastic stu-elastic added :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >refactoring labels Jul 12, 2022
@stu-elastic stu-elastic marked this pull request as ready for review July 12, 2022 04:27
@elasticmachine elasticmachine added Team:Core/Infra Meta label for core/infra team Team:Data Management Meta label for data/management team labels Jul 12, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@stu-elastic
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/part-2

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine. I don't quite see yet how this will get to what we discussed before (subclasses of Metadata passing the operations they support for keys, ie read-only vs read-write), but I know there are more followups. This PR on it's own looks like an improvement.

return null;
}

static class IngestMetadata extends Metadata {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is static, let's move it out to a tope level class. It can still be package private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already an IngestMetadata that "Holds the ingest pipelines that are available in the cluster".

I'm going to rename this to IngestDocMetadata when I make it top-level.

public static Tuple<Map<String, Object>, Map<String, Object>> splitSourceAndMetadata(Map<String, Object> sourceAndMetadata) {
if (sourceAndMetadata instanceof IngestSourceAndMetadata ingestSourceAndMetadata) {
return new Tuple<>(new HashMap<>(ingestSourceAndMetadata.source), new HashMap<>(ingestSourceAndMetadata.metadata.getMap()));
public static Tuple<Map<String, Object>, Map<String, Object>> splitSourceAndMetadata(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this method necessary when the source and metadata are already split internally? Couldn't this be two separate calls to member methods, to get the source map and the Metadata object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

}
return metadata;
}

/**
* Check that all metadata map contains only valid metadata and no extraneous keys and source map contains no metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this comment is off? there is no source map

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -172,7 +119,7 @@ public void setVersion(long version) {
}

public ZonedDateTime getTimestamp() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can move completely to IngestMetadata right? We shouldn't need it to exist at all on other Metadata subclasses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is available in 2 of 4 write contexts, ingest and update. It is missing in reindex and update by query

It's here because that allows scripts to do Metadata m = metadata(); m.timestamp. Otherwise, the script would have to name the subtype.

It is possible to add timestamp to reindex and update by query, both have access to the thread pool and pull a long supplier from it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could duck-type the Metadata subclasses just for timestamp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it’s relevant to all the other contexts, then it makes sense to stay completely in this class. But I don’t think we should have it defined here but implemented in subclasses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change the implementation here in the Update PR.

Copy link
Contributor

@jdconrad jdconrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A few minor comments.

ZonedDateTime timestamp,
Map<String, Object> source
) {
super(new HashMap<>(source), new IngestDocMetadata(index, id, version, routing, versionType, timestamp));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we do a deepcopy for source when calling this constructor, do we then again need to wrap source in a new HashMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of tests call into IngestDocument which calls this constructor, those test all do Map.of or SingletonMap and fail if we don't perform this copy.

@@ -763,7 +751,7 @@ public static Object deepCopy(Object value) {
for (Map.Entry<?, ?> entry : mapValue.entrySet()) {
copy.put(entry.getKey(), deepCopy(entry.getValue()));
}
// TODO(stu): should this check for IngestSourceAndMetadata in addition to Map?
// TODO(stu): should this check for IngestCtxMap in addition to Map?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to leave this as part of the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shows up because of the rename, it's still relevant question for now.

@@ -172,7 +119,7 @@ public void setVersion(long version) {
}

public ZonedDateTime getTimestamp() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could duck-type the Metadata subclasses just for timestamp.

@stu-elastic stu-elastic merged commit 85b8d3d into elastic:master Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >refactoring Team:Core/Infra Meta label for core/infra team Team:Data Management Meta label for data/management team v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants