Improve block loader fallback to source when source mode is synthetic. #115394

martijnvg · 2024-10-23T09:14:46Z

Sometimes MappedFieldType#blockLoader(...) implementations fallback to an implementation that uses source. For example when a field has doc values or stored fields disabled, when ignore above or ignore above have been configured. Meaning it would read the _source field and then extract the relevant field out of it and use that as value to be returned by the block loader.

When synthetic source is enabled then instead the source gets computed from many doc value or stored fields, and then the relevant field gets extracted. This is very slow and this should be improved. The interesting part with synthetic source is that we don't need to compute the source in order to provided a fallback values as part of block loaders returned by MappedFieldType#blockLoader(...).

Synthetic source details relevant to block loader fallback logic:

A field value exceeds the configured ignore above, then the value is stored in a separate stored field with the suffix _original.
A field value is malformed, then the value gets stored in a stored field with the same name as is defined in the mapping. Regardless of whether stored fields are enabled.
A field is has no stored or doc values fields. Then it gets stored in _ignored_source stored field.
Ignored source is the fallback for synthetic source to avoid content in source getting lost. So for example if a an object field is disabled or number of allowed mapped fields is exceeded, the field values / content should end up in ignored source.

In case of synthetic source the block loaders returned by MappedFieldType#blockLoader(...) can be made aware if these details and instead of returning a BlockSourceReader based implementation, return an implementation that uses the right stored field or uses ignored source.

Tasks:

Handle ignore above more efficiently.
Handle ignore malformed more efficiently.
Handle the reading field values that are stored in ignored source.

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2024-10-23T09:15:09Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2024-10-23T09:15:09Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

martijnvg · 2024-10-23T09:15:46Z

This POC (#114886) shows how falling back to ignore source can work.

felixbarny · 2024-10-25T14:32:34Z

Does this also cover cases where source filtering is used? In other words, when you only need to retrieve a specific field from _source, can we avoid synthesizing the full _source, which means fetching all fields?

martijnvg · 2024-10-25T14:38:22Z

This issue is in the context of synthetic source, but the idea is to avoid synthesizing the full source when only subset of fields is required. This isn't the case today.

felixbarny · 2024-10-25T15:13:11Z

My comment was also in context of synthetic source. Basically asking if we could also add an optimization to not re-construct the full _source when source filtering is used. Instead, just fetching fields that are required per the source filtering configuration. So if a doc has 100 fields, but the search request contains "_source": [ "foo", "bar" ], we could optimize to only fetch those two fields when synthesizing the source instead of synthesizing the full source and then filtering it afterwards.

martijnvg · 2024-10-25T15:47:18Z

Currently this issue is about es|ql's fallback mechanism to source when source mode is synthetic.

There is an issue for the search api: #94001
And @jimczi did relevant work recently that does source filtering correctly for synthetic source in the get api (#113827).

jimczi · 2024-10-25T16:31:27Z

I am currently focused on #114618 and was planning to resume and finish work for #113827 after that. @felixbarny are you interested by this change for the get or the search API? I am asking because get is much simpler to achieve than search which is why we started there.

felixbarny · 2024-10-25T16:51:12Z

I'm interested in _search. It's not something urgent. The question came up in the context of refactoring the APM UI to use fields. But some places still use _source with source filtering. Before the refactoring, there was a lot of usage of _source with filtering in the context of search. So I was suspecting that there may be other places that make use of _search + filtering that would have a performance regression when using synthetic _source.

martijnvg added :Analytics/Compute Engine Analytics in ES|QL :StorageEngine/Mapping The storage related side of mappings >enhancement Meta labels Oct 23, 2024

elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine labels Oct 23, 2024

martijnvg mentioned this issue Dec 10, 2024

[CI] EsqlActionBreakerIT class failing #118238

Closed

lkts mentioned this issue Jan 3, 2025

Prototype FallbackSyntheticSourceBlockLoader #119546

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve block loader fallback to source when source mode is synthetic. #115394

Improve block loader fallback to source when source mode is synthetic. #115394

martijnvg commented Oct 23, 2024 •

edited

Loading

elasticsearchmachine commented Oct 23, 2024

elasticsearchmachine commented Oct 23, 2024

martijnvg commented Oct 23, 2024

felixbarny commented Oct 25, 2024

martijnvg commented Oct 25, 2024

felixbarny commented Oct 25, 2024

martijnvg commented Oct 25, 2024

jimczi commented Oct 25, 2024

felixbarny commented Oct 25, 2024

Improve block loader fallback to source when source mode is synthetic. #115394

Improve block loader fallback to source when source mode is synthetic. #115394

Comments

martijnvg commented Oct 23, 2024 • edited Loading

elasticsearchmachine commented Oct 23, 2024

elasticsearchmachine commented Oct 23, 2024

martijnvg commented Oct 23, 2024

felixbarny commented Oct 25, 2024

martijnvg commented Oct 25, 2024

felixbarny commented Oct 25, 2024

martijnvg commented Oct 25, 2024

jimczi commented Oct 25, 2024

felixbarny commented Oct 25, 2024

martijnvg commented Oct 23, 2024 •

edited

Loading