Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support unmapped fields in search 'fields' option. #63690

Closed
jtibshirani opened this issue Oct 14, 2020 · 3 comments · Fixed by #65386
Closed

Support unmapped fields in search 'fields' option. #63690

jtibshirani opened this issue Oct 14, 2020 · 3 comments · Fixed by #65386
Assignees
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@jtibshirani
Copy link
Contributor

jtibshirani commented Oct 14, 2020

We recommend omitting a field from the mappings when a user wants to avoid the indexing overhead, but still retrieve its values. This is usually accomplished by using an object field with enabled: false or dynamic: false. Since 'fields' is meant to be the central place to retrieve document content, it should allow for loading these unmapped values.

Longer-term we'd like to discourage users from having unmapped fields, and instead introduce lightweight ways to map data without indexing it. But in the meantime, we think it's valuable to support unmapped fields.

One proposal is to add a flag include_unmapped:

{
  "query": { "match_all": {} },
  "fields": [{
    "field": "headers.*",
    "include_unmapped": true
  }]
}

When include_unmapped is specified, the 'fields' option would do the following:

  1. Find all mapped leaf fields matching the patterns, collect + parse their values and add those to the hit. (This is what the option currently does).
  2. Also find all leaf fields in _source matching the patterns. If we find any fields that were not included in 1, add those to the hit as well. The fields are returned as-is, ignoring the 'format’ option.

The flag could default to 'false', and eventually be phased out as it becomes less common to have unmapped fields.

Some questions to consider:

  • Do we prefer adding a flag or just returning unmapped fields by default? It's always nice to avoid flags, and as we move towards a set-up where unmapped fields are rare, the distinction would become unimportant anyways. But the behavior is a bit hacky/ tricky, and I like that the flag indicates that it's 'non-standard' to be retrieving unmapped fields (in line with our longer term vision).
  • Would returning unmapped fields by default hurt fetch performance in the 'happy case' where all fields are mapped?
  • Is it fine to just return the unmapped values in a flattened list? This would drop structure in values like unmapped object arrays, ranges, etc.
@jtibshirani jtibshirani added >enhancement :Search/Search Search-related issues that do not fall into other categories labels Oct 14, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@astefan
Copy link
Contributor

astefan commented Oct 21, 2020

Recently, we've had a request in discuss about something tangential to supporting unmapped fields in SQL (implicitly the fields API): https://discuss.elastic.co/t/elasticsearch-sql-cant-query-unindexed-field/251635, but I am not sure if SQL will indeed actually use this feature. Data returned/queried in SQL has to have a type, if the field doesn't exist in the mapping, this means no data type which means the field cannot be actually used.

cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Nov 5, 2020
Currently, the 'fields' option only supports fetching mapped fields. Since
'fields' is meant to be the central place to retrieve document content, it
should allow for loading unmapped values.
This change adds implementation and tests for this addition.

Closes elastic#63690
@cbuescher
Copy link
Member

The current WIP is out at #64651, so I want to summarize the currently propsed implementation details for better visibility here:

API

After some indivudual discussion I think we should opt for not returning unmapped fields by default, but instead rely on a per-pattern flag include_unmapped that when set to true switches on the new behaviour of returning unmapped fields. This forces users to be more explicit about what behaviour they expect and while users like Kibana can easily switch this on internally, others like e.g. SQL (see comment above) which are not interested in unmapped fields, would not need to filter out anything.

Example:

POST /test/_search
{
  "_source": false,
  "fields": [
    "*",
    {
      "field": "obj.*",
      "include_unmapped": true
    }
  ]
}

would return all mapped fields (“”) and all unmapped fields under “obj.

Return format

Unmapped fields under an inner object that has ”enabled” : false set are returned in the same flattened manner than the same field when they are mapped, e.g.

PUT /test
{
  "mappings": {
    "enabled" : false,
    "properties": {
      "obj": {
        "type": "object"
      }
    }
  }
}

PUT /test/_doc/1
{
  "obj": [
    {
      "age": 33,
      "name" : "James",
      "tags" : ["java", "c#" ]
    },
    {
      "age": 27,
      "name" : "Kim",
      "employer" : {
        "company" : "Elastic",
        "web" : "www.elastic.co"
      }
    }
  ]
}

POST /test/_search
{
  "_source": false,
  "fields": [
    {
      "field": "*",
      "include_unmapped": true
    }
  ]
}

would return (the same that same output that would be returned for ”enabled” : true, minus the dynamically added keyword subfields.

"fields" : {
          "obj.name" : [
            "James",
            "Kim"
          ],
          "obj.age" : [
            33,
            27
          ],
          "obj.tags" : [
            "java",
            "c#"
          ],
          "obj.employer.web" : [
            "www.elastic.co"
          ],
          "obj.employer.company" : [
            "Elastic"
          ]
        }

Also, like when currently using fields for mapped fields, specifying a path that leads to an intermediate node (like “obj” or “obj.employer”) returns an empty result. The behaviour when specifying ”dynamic” : “false” on the object level should be the same

Some edge cases

Handling values dropped due to “ignore_malformed”

When mapped fields discard malformed values due to ignore malformed : true (e.g. an integer with value “foo”) the value isn’t returned with the mapped fields, so we also don’t want to return them with the include_unmappedoption, even if the malformed value is still part of _source

Don’t return null values

The fields option currently doesn’t return mapped fields that have a null value explicitly set in _source. We would also not do this for unmapped fields for consistency and only change this behaviour if we also decide to return null values for mapped fields.

cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Nov 23, 2020
Currently, the 'fields' option only supports fetching mapped fields. Since
'fields' is meant to be the central place to retrieve document content, it
should allow for loading unmapped values. This change adds implementation and
tests for this feature.

Closes elastic#63690
cbuescher pushed a commit that referenced this issue Dec 1, 2020
Currently, the 'fields' option only supports fetching mapped fields. Since
'fields' is meant to be the central place to retrieve document content, it
should allow for loading unmapped values. This change adds implementation and
tests for this feature.

Closes #63690
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Dec 1, 2020
Currently, the 'fields' option only supports fetching mapped fields. Since
'fields' is meant to be the central place to retrieve document content, it
should allow for loading unmapped values. This change adds implementation and
tests for this feature.

Closes elastic#63690
cbuescher pushed a commit that referenced this issue Dec 3, 2020
Currently, the 'fields' option only supports fetching mapped fields. Since
'fields' is meant to be the central place to retrieve document content, it
should allow for loading unmapped values. This change adds implementation and
tests for this feature.

Closes #63690
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
4 participants