Synthetic _source: support match_only_text #89516

nik9000 · 2022-08-22T18:28:46Z

This adds support for synthetic _source to the match_only_text field
type. When synthetic _source is enabled match_only_text fields
create a hidden stored field to contain their text. This should have
similar or better search performance for this specific field type,
though it will have slightly worse indexing performance because
synthetic _source is still writing _recovery_source, which means
we're writing the bits for this field twice.

This adds support for synthetic `_source` to the `match_only_text` field type. When synthetic `_source` is enabled `match_only_text` fields create a hidden stored field to contain their text. This should have similar or better search performance for this specific field type, though it will have slightly worse indexing performance because synthetic `_source` is still writing `_recovery_source`, which means we're writing the bits for this field twice.

elasticsearchmachine · 2022-08-22T18:29:42Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

elasticsearchmachine · 2022-08-22T18:29:42Z

Hi @nik9000, I've created a changelog YAML for you.

elasticsearchmachine · 2022-08-22T18:29:42Z

Pinging @elastic/es-search (Team:Search)

nik9000 · 2022-08-22T20:13:00Z

...per-extras/src/main/java/org/elasticsearch/index/mapper/extras/MatchOnlyTextFieldMapper.java

+                    } catch (IOException e) {
+                        throw new UncheckedIOException(e);
+                    }
+                };


There's a stored field lookup thing in searchExecutionContext.lookup() but it can't be convinced to load the hidden stored field. If we feel strongly about it I can try and integrate into it, but I'm not super sure how at the moment.

Yeah that's strictly for scripts and is integrated poorly with other stored field lookup stuff, so I don't think it's worth trying to re-use it for the moment. I do think that the document lookup API I'm playing with at the moment will improve this though.

It has a lovely caching mechanism that I think could be quite nice. If multiple queries need to recheck the source it'll load once while this won't.

So, somewhat annoyingly, I don't think we will be able to re-use a stored field loader from the SearchLookup here because the underlying API expects a LeafReaderContext, not a LeafSearchLookup. But you should be able to use a LeafStoredFieldLoader rather than a FieldVisitor here which will at least be more readable.

romseygeek

This looks great! I think it may be worth waiting for a few days to see if I can get document loaders working though as it will tidy up the impl a fair amount.

romseygeek · 2022-08-23T11:29:08Z

...per-extras/src/main/java/org/elasticsearch/index/mapper/extras/MatchOnlyTextFieldMapper.java

+                    } catch (IOException e) {
+                        throw new UncheckedIOException(e);
+                    }
+                };


Yeah that's strictly for scripts and is integrated poorly with other stored field lookup stuff, so I don't think it's worth trying to re-use it for the moment. I do think that the document lookup API I'm playing with at the moment will improve this though.

romseygeek · 2022-08-23T11:29:27Z

...per-extras/src/main/java/org/elasticsearch/index/mapper/extras/MatchOnlyTextFieldMapper.java

@@ -326,6 +371,10 @@ protected void parseCreateField(DocumentParserContext context) throws IOExceptio
        Field field = new Field(fieldType().name(), value, fieldType);
        context.doc().add(field);
        context.addToFieldNames(fieldType().name());
+
+        if (context.isSyntheticSource()) {
+            context.doc().add(new Field(fieldType().originalFieldName(), value, ORIGINAL_FIELD_TYPE));


Use StoredField rather than creating a new field type

romseygeek · 2022-08-23T11:30:04Z

...per-extras/src/main/java/org/elasticsearch/index/mapper/extras/MatchOnlyTextFieldMapper.java

@@ -279,6 +321,9 @@ public IndexFieldData.Builder fielddataBuilder(FieldDataContext fieldDataContext
            throw new IllegalArgumentException(CONTENT_TYPE + " fields do not support sorting and aggregations");
        }

+        private String originalFieldName() {


Maybe call this storedFieldName to make it a bit clearer what it's used for?

romseygeek · 2022-08-23T11:33:16Z

...xtras/src/test/java/org/elasticsearch/index/mapper/extras/MatchOnlyTextFieldMapperTests.java

+        }
+        return fieldMapping(b -> b.field("type", "match_only_text"));
+    }
+


I think I'd be happier with coverage here if we explicitly run both tests for the synthetic and non-synthetic case, this feels at the moment like we're only testing 50% of the functionality on each run and I think that will come back to bite us.

nik9000 · 2022-08-23T11:39:29Z

This looks great! I think it may be worth waiting for a few days to see if I can get document loaders working though as it will tidy up the impl a fair amount.

👍

Adds more tests for the enrich processor around different index types. Right now they all work fine (yay!) but this feels like a good amount of paranoia.

nik9000 · 2022-08-31T19:20:23Z

@romseygeek this should be ready for you too!

romseygeek

LGTM

nik9000 · 2022-09-01T14:52:39Z

I almost clicked the merge button! It wouldn't have compiled! sneaky sneaky.

There are at least two data types that are supported AFAIK and are not mentioned: #88603 #89516

nik9000 added :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics v8.5.0 labels Aug 22, 2022

nik9000 requested a review from romseygeek August 22, 2022 18:28

nik9000 added the >enhancement label Aug 22, 2022

elasticsearchmachine added Team:Search Meta label for search team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Aug 22, 2022

nik9000 added 3 commits August 22, 2022 14:40

Words

2899228

Update docs/changelog/89516.yaml

4d2f0b6

Fixup

2e268c3

nik9000 mentioned this pull request Aug 22, 2022

Synthetic Source #86603

Closed

50 tasks

nik9000 commented Aug 22, 2022

View reviewed changes

Fixup

0b242b9

romseygeek reviewed Aug 23, 2022

View reviewed changes

nik9000 added 4 commits August 23, 2022 11:41

More tests for enrich processor

0d9fe2d

Adds more tests for the enrich processor around different index types. Right now they all work fine (yay!) but this feels like a good amount of paranoia.

Merge branch 'main' into synthetic_source_match_only_text

7c3bb3f

Clean

1b08a25

Merge branch 'main' into synthetic_source_match_only_text

6cfda5b

nik9000 added 2 commits August 31, 2022 15:21

Merge branch 'main' into synthetic_source_match_only_text

0495296

Update

8998678

romseygeek approved these changes Sep 1, 2022

View reviewed changes

nik9000 added 2 commits September 1, 2022 10:58

Merge branch 'main' into synthetic_source_match_only_text

373a229

Compile after merge

6c27a51

Fixup

67319d8

nik9000 merged commit 44693b0 into elastic:main Sep 1, 2022

giladgal added a commit that referenced this pull request Nov 28, 2022

Update synthetic-source.asciidoc

f15055d

There are at least two data types that are supported AFAIK and are not mentioned: #88603 #89516

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synthetic _source: support match_only_text #89516

Synthetic _source: support match_only_text #89516

nik9000 commented Aug 22, 2022 •

edited

Loading

elasticsearchmachine commented Aug 22, 2022

elasticsearchmachine commented Aug 22, 2022

elasticsearchmachine commented Aug 22, 2022

nik9000 Aug 22, 2022

romseygeek Aug 23, 2022

nik9000 Aug 23, 2022

romseygeek Aug 31, 2022

romseygeek left a comment

romseygeek Aug 23, 2022

romseygeek Aug 23, 2022

romseygeek Aug 23, 2022

nik9000 Aug 23, 2022

nik9000 Aug 23, 2022

romseygeek Aug 23, 2022

nik9000 commented Aug 23, 2022

nik9000 commented Aug 31, 2022

romseygeek left a comment

nik9000 commented Sep 1, 2022

Synthetic _source: support match_only_text #89516

Synthetic _source: support match_only_text #89516

Conversation

nik9000 commented Aug 22, 2022 • edited Loading

elasticsearchmachine commented Aug 22, 2022

elasticsearchmachine commented Aug 22, 2022

elasticsearchmachine commented Aug 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romseygeek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nik9000 commented Aug 23, 2022

nik9000 commented Aug 31, 2022

romseygeek left a comment

Choose a reason for hiding this comment

nik9000 commented Sep 1, 2022

nik9000 commented Aug 22, 2022 •

edited

Loading