Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta] Better handling of single-valued fields #80825

Open
8 tasks
markharwood opened this issue Nov 18, 2021 · 3 comments
Open
8 tasks

[Meta] Better handling of single-valued fields #80825

markharwood opened this issue Nov 18, 2021 · 3 comments
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@markharwood
Copy link
Contributor

markharwood commented Nov 18, 2021

Background

For a long time elasticsearch has been very permissive about JSON documents and has made no distinction between single values and arrays of values. This permissive approach has several downsides:

  1. Client code and scripts are made more complex. To be robust, code must be written to handle both single-valued fields and arrays of fields.
  2. Kibana does some strange things. e.g. Kibana will happily try "AND" multiple values from a bar chart/pie chart which never makes sense for values taken from a single-valued field. This produces no matches because no document can be OS:ios and OS:android simultaneously
  3. Administrators cannot easily "lock down" the mapping. Custom ingest scripts are required to prevent multi-valued documents being added (and ingest scripts can still be circumvented by clients sending documents?).

All of the above is unfortunate because the majority of fields in common use are single-valued. A weblog's fields are a good example (timestamp, IP, OS, user agent, URL, referrer, country etc are all single values).

Proposed changes

The solution is a 2-pronged approach :
Enforcement: for new indices we can give administrators the option of rejecting documents with multiple-values.
Reporting: for both new and old indices we can report if the index contains only documents with single values

  • Add an is_single_valued flag to field caps output which indicates if all documents have single values for a field Field caps api - report back if fields are single-valued or not. #80730
  • Add a boolean allowsMultipleValues() method to FieldMapper and remove existing validation code in single-valued fields that is slow. The DocumentParser class should instead assume responsibility for checking single-valued fields don't receive multiple values
  • Add an allow_multiple_values flag to field mappings that can reject documents presenting arrays New field mapping flag - allow_multiple_values #80289
  • Remove existing code from always-singular fields like AggregateDoubleMetricFieldMapper that checks for arrays. This logic is sometimes slow and these classes can instead override FieldMapper.allowMultipleValues() to declare false and let DocumentParser do all the array detection/rejection.
  • Optimise performance of the field-caps reporting to avoid looking at index contents when the allow_multiple_values field mapping is set and we know this is enforced at ingest time
  • Consider enhancing the storage types used for fields where allow_multiple_values is set to false (using NumericDocValuesField instead of SortedNumericDocValuesField and SortedDocValuesField instead of SortedSetDocValuesField)
  • Change ECS to support single-valued fields (RFC opened https://github.com/elastic/ecs/blob/main/rfcs/text/0029-enforce-single-value-fields.md )
  • Any Kibana-related changes to make use of the is_single_valued feedback in field-caps (e.g. not ANDing values from this field in filter pills). Mention of related progress here
@markharwood markharwood added >enhancement :Search/Search Search-related issues that do not fall into other categories labels Nov 18, 2021
@markharwood markharwood self-assigned this Nov 18, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Nov 18, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@jpountz
Copy link
Contributor

jpountz commented Aug 3, 2022

Some thoughts on this proposal:

  • While multi-valued fields are indeed infrequent, there is a few of them that we have in many of our datasets, e.g. host.ip in the case when a host has multiple network interfaces.
  • Many of our Logging users would prefer if Elasticsearch was more forgiving about data ingestion, so I think it will be a challenge to update ECS to require some fields to be single-valued, except maybe for fields that are owned by Agent itself like agent.id or host.name. Likewise, for field mappings created through dynamic mappings, users are unlikely to be able to set the single-value flag since they don't know about these fields. I believe we're more likely to be successful with the reporting approach than the enforcing approach.
  • While I have more faith in the reporting approach, there are several cases in which Elasticsearch simply has no clue, e.g. runtime fields that reads data from _source. Presumably, always assuming that runtime fields may be multi-valued would defeat a lot of the value we're expecting from this proposal, but we don't have a good way to detect whether runtime fields are single-valued in general. Should we have a flag on them that allows to declare whether they're single or multi-valued, and maybe assume single-valued by default? (Not a fan of this suggestion, mostly adding it to get the discussion started.)
  • In general, I think we should not have different semantics in the UI depending on whether the field is single or multi-valued, a field may be accidentally single-valued because none of the docs happened to actually have multiple values, and maybe another field is accidentally multi-valued because of a single dirty document. E.g. I think it would be an anti-pattern to AND or OR multiple values from a pie chart depending on whether a field is single or multi-valued. But maybe we could disable some operations on multi-valued fields because they only make sense on single-valued fields. (We may need a way to configure at the field level whether an AND or an OR makes more sense, but to me this would be a different feature.)

@javanna javanna added :Search Foundations/Mapping Index mappings, including merging and defining field types and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2024
@elasticsearchmachine elasticsearchmachine added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

5 participants