Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest: Start separating Metadata from IngestSourceAndMetadata #88401

Merged
merged 13 commits into from
Jul 12, 2022

Conversation

stu-elastic
Copy link
Contributor

Pull out the implementation of Metadata from IngestSourceAndMetadata.

Metadata will become a base class extended by the update contexts: ingest, update, update by query and reindex.

Metadata implements a map-like interface, making it easy for a class containing Metadata to implement the full Map interface.

@stu-elastic stu-elastic added :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >refactoring labels Jul 8, 2022
@stu-elastic stu-elastic marked this pull request as ready for review July 11, 2022 16:04
@elasticmachine elasticmachine added Team:Core/Infra Meta label for core/infra team Team:Data Management Meta label for data/management team labels Jul 11, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@stu-elastic
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-1

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting closer to what we discussed! I left some comments.

protected final Map<String, Validator> validators;

// timestamp is new to ingest metadata, so it doesn't need to be backed by the map for back compat
protected final ZonedDateTime timestamp;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to be more troublesome as more contexts are added. Timestamp is specific to ingest. Yet all contexts are going to have this extra member. I suggest (in a followup) to consider either putting this in the map (it's not the worst thing to have it available in ctx), and/or to allow subclasses to provide additional keys/validators (but validators can still be protected). I think this will be necessary anyways to add metadata support in watcher, which has some additional keys that we would not add here in server.

Copy link
Contributor Author

@stu-elastic stu-elastic Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest (in a followup) to consider either putting this in the map (it's not the worst thing to have it available in ctx), and/or to allow subclasses to provide additional keys/validators (but validators can still be protected).

I moved this to it's own subclass when implementing Update. This PR is just moving the code over.

The Update context has this map backed but it is backwards incompatible to add it to a ctx for ingest unless we say underscore prefixed values in ctx are reserved.

// timestamp is new to ingest metadata, so it doesn't need to be backed by the map for back compat
protected final ZonedDateTime timestamp;

public Metadata(String index, String id, long version, String routing, VersionType versionType, ZonedDateTime timestamp) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ctor is basically Ingest specific. Can we replace this ctor with calls to the relevant setters in ingest when constructing the metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the next PR, Metadata takes the fully constructed Map. I kept this here to have a minimal code change in this PR.

Can we replace this ctor with calls to the relevant setters in ingest when constructing the metadata?

Then we'd have to manually call the initial validation, I think it works better to validate the full map on construction.

if (value instanceof Number number) {
return number;
}
throw new IllegalArgumentException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be an illegal state or assertion error? It got past the validators; this is a coding error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to IllegalStateException.

* Get the {@link Number} associated with key, or null
* @throws IllegalArgumentException if the value is not a {@link Number}
*/
public Number getNumber(String key) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these helper methods should be public, they can be protected, just used by the validators right? The only public members (aside from general map stuff necessary for ctx) should be the strongly keyed/typed methods meant to be exposed in scripts.

Copy link
Contributor Author

@stu-elastic stu-elastic Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we implement getOp/setOp in the CtxMap for Update, it will reuse getString. I'll make these protected now and we can discuss it then.

* Is this key a Metadata key? A {@link #remove}d key would return false for {@link #containsKey(String)} but true for
* this call.
*/
public boolean isMetadata(String key) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the naming is confusing, since this is in a class called Metadata. How about isAvailable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to isAvailable.

this(map, timestamp, VALIDATORS);
}

Metadata(Map<String, Object> map, ZonedDateTime timestamp, Map<String, Validator> validators) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think initial values for the map are really necessary (use the setters, see my other comment). But I think what is missing is the user (ie ingest) specifying which keys should be available. What is here isn't all the possible keys right? And other contexts wont' want all of these? What I was trying to suggest in our previous conversations was to have the ctor take in a map from key to allowed access (read only vs read/write). Perhaps it could be two lists. Then the validators are grabbed for each one of those, internally within the ctor, to choose which validators are set on the instance.

Copy link
Contributor Author

@stu-elastic stu-elastic Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think what is missing is the user (ie ingest) specifying which keys should be available.

Those changes are coming in the third PR in this series. This PR separates the code, the next PR will create a CtxMap from IngestSourceAndMetadata and the final will allow configuration based validation.

I'm happy to roll those two changes into this one, if that's desired.

*/
String getIndex();
public List<String> getKeys() {
return new ArrayList<>(map.keySet());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of copying, this could be wrapped with Collections.unmodifiableSet? I would also name this keySet (and return a Set), no reason not to use the standard collection type for keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, done.

/**
* Get the metadata keys inside the {@param other} set
*/
public Set<String> metadataKeys(Set<String> other) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the intersection of the keyset and the keys passed in. Instead of another public method, I think the getKeys (or keySet name as I suggested above) can be called with Sets.intersection in the one place in ingest using this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done.

* Allow Numbers that can be represented as longs without loss of precision or null
* @throws IllegalArgumentException if the value cannot be represented as a long
*/
public static void longValidator(MapOperation op, String key, Object value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be protected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to protected.

* Allow maps.
* @throws IllegalArgumentException if {@param value} is not a {@link Map}
*/
public static void mapValidator(MapOperation op, String key, Object value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be protected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to protected.

@stu-elastic
Copy link
Contributor Author

@rjernst I've collapsed the three intended PRs into one at #88458 if you think that would be easier to review.

Copy link
Contributor

@jdconrad jdconrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one additional question following up on @rjernst's review.

for (Map.Entry<String, Object> metadata : metadataMap.entrySet()) {
if (metadata.getValue() != null) {
builder.field(metadata.getKey(), metadata.getValue().toString());
org.elasticsearch.script.Metadata metadata = ingestDocument.getMetadata();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Metadata be able to read/write it's own XContent stream? Or maybe Metadata should have something to return a Map of its keys to values? I'm considering the case where in the future Metadata may not necessarily be backed by a Map so it won't have something like keySet and get naturally available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting thoughts but I don't think having Metadata own it's XContent makes sense right now.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jdconrad
Copy link
Contributor

LGTM too!

@stu-elastic stu-elastic merged commit 39de085 into elastic:master Jul 12, 2022
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Jul 13, 2022
* upstream/master: (38 commits)
  Simplify map copying (elastic#88432)
  Make DiffableUtils.diff implementation agnostic (elastic#88403)
  Ingest: Start separating Metadata from IngestSourceAndMetadata (elastic#88401)
  Move runtime fields base scripts out of scripting fields api package. (elastic#88488)
  Enable TRACE Logging for test and increase timeout (elastic#88477)
  Mute ReactiveStorageIT#testScaleDuringSplitOrClone (elastic#88480)
  Track the count of failed invocations since last successful policy snapshot (elastic#88398)
  Avoid noisy exceptions on data nodes when aborting snapshots (elastic#88476)
  Fix ReactiveStorageDeciderServiceTests testNodeSizeForDataBelowLowWatermark (elastic#88452)
  INFO logging of snapshot restore and completion (elastic#88257)
  unmute test (elastic#88454)
  Updatable API keys - noop check (elastic#88346)
  Corrected an incomplete sentence. (elastic#86542)
  Use consistent shard map type in IndexService (elastic#88465)
  Stop registering TestGeoShapeFieldMapperPlugin in ESIntegTestCase (elastic#88460)
  TSDB: RollupShardIndexer logging improvements (elastic#88416)
  Audit API key ID when create or grant API keys (elastic#88456)
  Bound random negative size test in SearchSourceBuilderTests#testNegativeSizeErrors (elastic#88457)
  Updatable API keys - logging audit trail event (elastic#88276)
  Polish reworked LoggedExec task (elastic#88424)
  ...

# Conflicts:
#	x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/v2/RollupShardIndexer.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >refactoring Team:Core/Infra Meta label for core/infra team Team:Data Management Meta label for data/management team v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants