Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synthetic _source: support match_only_text #89516

Merged
merged 14 commits into from
Sep 1, 2022

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Aug 22, 2022

This adds support for synthetic _source to the match_only_text field
type. When synthetic _source is enabled match_only_text fields
create a hidden stored field to contain their text. This should have
similar or better search performance for this specific field type,
though it will have slightly worse indexing performance because
synthetic _source is still writing _recovery_source, which means
we're writing the bits for this field twice.

This adds support for synthetic `_source` to the `match_only_text` field
type. When synthetic `_source` is enabled `match_only_text` fields
create a hidden stored field to contain their text. This should have
similar or better search performance for this specific field type,
though it will have slightly worse indexing performance because
synthetic `_source` is still writing `_recovery_source`, which means
we're writing the bits for this field twice.
@nik9000 nik9000 added :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics v8.5.0 labels Aug 22, 2022
@nik9000 nik9000 requested a review from romseygeek August 22, 2022 18:28
@elasticsearchmachine elasticsearchmachine added Team:Search Meta label for search team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Aug 22, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@nik9000 nik9000 mentioned this pull request Aug 22, 2022
50 tasks
} catch (IOException e) {
throw new UncheckedIOException(e);
}
};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a stored field lookup thing in searchExecutionContext.lookup() but it can't be convinced to load the hidden stored field. If we feel strongly about it I can try and integrate into it, but I'm not super sure how at the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's strictly for scripts and is integrated poorly with other stored field lookup stuff, so I don't think it's worth trying to re-use it for the moment. I do think that the document lookup API I'm playing with at the moment will improve this though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has a lovely caching mechanism that I think could be quite nice. If multiple queries need to recheck the source it'll load once while this won't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, somewhat annoyingly, I don't think we will be able to re-use a stored field loader from the SearchLookup here because the underlying API expects a LeafReaderContext, not a LeafSearchLookup. But you should be able to use a LeafStoredFieldLoader rather than a FieldVisitor here which will at least be more readable.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I think it may be worth waiting for a few days to see if I can get document loaders working though as it will tidy up the impl a fair amount.

} catch (IOException e) {
throw new UncheckedIOException(e);
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's strictly for scripts and is integrated poorly with other stored field lookup stuff, so I don't think it's worth trying to re-use it for the moment. I do think that the document lookup API I'm playing with at the moment will improve this though.

@@ -326,6 +371,10 @@ protected void parseCreateField(DocumentParserContext context) throws IOExceptio
Field field = new Field(fieldType().name(), value, fieldType);
context.doc().add(field);
context.addToFieldNames(fieldType().name());

if (context.isSyntheticSource()) {
context.doc().add(new Field(fieldType().originalFieldName(), value, ORIGINAL_FIELD_TYPE));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use StoredField rather than creating a new field type

@@ -279,6 +321,9 @@ public IndexFieldData.Builder fielddataBuilder(FieldDataContext fieldDataContext
throw new IllegalArgumentException(CONTENT_TYPE + " fields do not support sorting and aggregations");
}

private String originalFieldName() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this storedFieldName to make it a bit clearer what it's used for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}
return fieldMapping(b -> b.field("type", "match_only_text"));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd be happier with coverage here if we explicitly run both tests for the synthetic and non-synthetic case, this feels at the moment like we're only testing 50% of the functionality on each run and I think that will come back to bite us.

@nik9000
Copy link
Member Author

nik9000 commented Aug 23, 2022

This looks great! I think it may be worth waiting for a few days to see if I can get document loaders working though as it will tidy up the impl a fair amount.

👍

Adds more tests for the enrich processor around different index types.
Right now they all work fine (yay!) but this feels like a good amount of
paranoia.
@nik9000
Copy link
Member Author

nik9000 commented Aug 31, 2022

@romseygeek this should be ready for you too!

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nik9000
Copy link
Member Author

nik9000 commented Sep 1, 2022

I almost clicked the merge button! It wouldn't have compiled! sneaky sneaky.

@nik9000 nik9000 merged commit 44693b0 into elastic:main Sep 1, 2022
giladgal added a commit that referenced this pull request Nov 28, 2022
There are at least two data types that are supported AFAIK and are not mentioned:
#88603
#89516
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants