Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning nested data in fields API #67432

Merged
merged 31 commits into from
Feb 5, 2021
Merged

Conversation

cbuescher
Copy link
Member

This is a POC on one way of returning nested data with the fields API (#63709).
This PR includes fields of the "nested" type in the output of the fields API response and applies the fields API logic to each individual sub-document under that nested field path recursively. This means we return a flattend list of leaf values for all fields outside of any nested object and group leaf fields inside each nested document indivudually.

For anl example document, where both "user" and "user.adress" are field of the nested type:

{
  "id" : "123456",
  "user" : [
    {
      "first" : "John",
      "last" : "Brown",
      "adress" : {
        "city" : "Berlin",
        "zip" : "5555"
      }
    },
    {
      "first" : "Alice",
      "last" :  "White",
      "adress" : [{
        "city" : "Toronto",
        "zip" : "1111"
      },{
        "city" : "Ottawa",
        "zip" : "2222"
      }]
    }
  ]
}

For a "fields" : ["*"] query we would return

       "fields" : {
          "id" : [
            "123456"
          ],
          "user" : [
            {
              "last" : [
                "Brown"
              ]
              "adress" : [
                {
                  "zip" : [
                    "5555"
                  ],
                  "city" : [
                    "Berlin"
                  ]
                }
              ],
              "first" : [
                "John"
              ]
            },
            {
              "last" : [
                "White"
              ],
              "adress" : [
                {
                  "zip" : [
                    "1111"
                  ],
                  "city" : [
                    "Toronto"
                  ]
                },
                {
                  "zip" : [
                    "2222"
                  ],
                  "city" : [
                    "Ottawa"
                  ]
                }
              ],
              "first" : [
                "Alice"
              ]
            }
          ],
          "user.adress" : [
            {
              "zip" : [
                "5555"
              ],
              "city" : [
                "Berlin"
              ]
            },
            {
              "zip" : [
                "1111"
              ],
              "city" : [
                "Toronto"
              ]
            },
            {
              "zip" : [
                "2222"
              ],
              "city" : [
                "Ottawa"
              ]
            }
          ]
        }

An alternative would be to not return "user.adress" as an individual key in the above example but only inside "user". If wanted, we could also choose to return all the flattened leave fields (also the ones that are inside nested documents) in the root of the fields response as a flattened list like we do now.

This is just a draft with only basic testing. I want to extend testing with unmapped fields inside nested documents and also try solving the problem that currently I don't pass down the "ignore_field" content inside the NestedFieldValueFetcher.

@cbuescher
Copy link
Member Author

After discussing this internally there have been some changes to the output format that I'll try to summarize:

  • everything outside a nested field stays as-is, that is a flattened list of key-value pairs
  • once we encounter a pattern in the request matches nested field or something below (i.e. leaves inside nested fields), we group everything under each nested entry so the relation of the leave values is maintained. Intermediate object levels are still flattened.
  • if we target multiple nested fields, i.e. "user" and "user.adress" are both mapped to type "nested", "user.adress" will appear inside the output of the "user" field

Here are some examples:

DELETE my_index

PUT my_index
{
  "mappings": {
    "properties": {
      "obj": {
        "properties": {
          "outside_nested": {
            "type": "keyword"
          }
        }
      },
      "user": {
        "type": "nested",
        "properties": {
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          }
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "obj" : {
    "outside_nested" : "value_outside_nested"
  },
  "user" : [
    {
      "name" : "John Brown",
      "age" : 27
    },
    {
      "name" : "Alice White",
      "age" : 28
    },
    {
      "name" : "Rosi Red",
      "acount" : {
        "id" : 12345
      }
    }
  ]
}

POST /my_index/_search
{
    "_source": false,
    "fields" : ["*"]
}

returns

       "fields" : {
          "obj.outside_nested" : [
            "value_outside_nested" <--------- fields outside nested fields are returned flattened as usual
          ],
          "user" : [ <-------- everything under user field is returned grouped by entries
            {
              "name" : [
                "John Brown"
              ],
              "name.keyword" : [  <-------- "fields" will also contain multi-fields not visible in "_source"
                "John Brown"
              ],
              "age" : [
                27
              ]
            },
            {
              "name" : [
                "Alice White"
              ],
              "name.keyword" : [
                "Alice White"
              ],
              "age" : [
                28
              ]
            },
            {
              "name" : [
                "Rosi Red"
              ],
              "name.keyword" : [
                "Rosi Red"
              ],
              "acount.id" : [ <--------- nested objects that are _not_ mapped as nested fields will still be flattened
                12345
              ]
            }
          ]
        }
POST /my_index/_search
{
    "_source": false,
    "fields" : ["user.age"]
}

returns

       "fields" : {
          "user" : [  <--------- retain the structure of the parent "user" nested field in the output but include only the "user.age" leaves
            {
              "age" : [
                27
              ]
            },
            {
              "age" : [
                28
              ]
            }
          ]
        }
      }

Directly querying a nested field has no output. This is consistent with querying an object field in the current fields API:

POST /my_index/_search
{
    "_source": false,
    "fields" : ["user"]
}

----->
"hits" : [
      {
        "_index" : "my_index",
        "_id" : "1",
        "_score" : 1.0
      }
    ]

BUT using "user.*" returns and groups all leave fields underneath "user":

POST /my_index/_search
{
    "_source": false,
    "fields" : ["user.*"]
}

------>
         "user" : [
            {
              "name" : [
                "John Brown"
              ],
              "name.keyword" : [
                "John Brown"
              ],
              "age" : [
                27
              ]
            },
            {
              "name" : [
                "Alice White"
              ],
              "name.keyword" : [
                "Alice White"
              ],
              "age" : [
                28
              ]
            },
            {
              "name" : [
                "Rosi Red"
              ],
              "name.keyword" : [
                "Rosi Red"
              ],
              "acount.id" : [
                12345
              ]
            }
          ]
        }
      }

@cbuescher
Copy link
Member Author

I have added some more example to this gist but leave them out here for brevity.

@cbuescher cbuescher marked this pull request as ready for review January 21, 2021 17:50
@cbuescher cbuescher changed the title WIP: Returning nested data in fields API Returning nested data in fields API Jan 21, 2021
@cbuescher cbuescher added :Search Foundations/Mapping Index mappings, including merging and defining field types >enhancement v7.12.0 v8.0.0 labels Jan 21, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jan 21, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@cbuescher cbuescher removed the WIP label Jan 21, 2021
@cbuescher
Copy link
Member Author

While there are still some non-final decisions on this PR and I'd expect the implementation code to get cleaned up quite a bit, I removed the WIP status since I think the output is what we are aiming for.

@cbuescher cbuescher requested a review from jtibshirani January 25, 2021 19:56
Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach makes sense to me overall. I still need to go over a few details but left some initial comments.

Collections.sort(in);
String lastAddedEntry = in.get(0);
shortestPrefixes.add(lastAddedEntry);
for (String entry : in) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work correctly when nested field names are naturally prefixes of each other? Like nested and nested2? I wonder if we need to consult the mapping information directly to get the parent nested mappings. Another edge case that comes to mind is if field names contain special characters that could sort before periods ..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, I changed an existing test so it shows this shortcoming and changed the lookup for the nested parents out of all available nested mappers.

@cbuescher
Copy link
Member Author

@jtibshirani thanks for the feedback, hope the last one fixes the last problem

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved! Reposting our follow-ups here so it's easy to track:

  • Handling mixed object and dot notation
  • Making sure DocValueFetcher in nested documents don't error
  • Adding a note in the docs

@cbuescher
Copy link
Member Author

@elasticmachine run elasticsearch-ci/packaging-sample-unix

@cbuescher
Copy link
Member Author

just fyi:

  • Adding a note in the docs

I opened #68657

  • Handling mixed object and dot notation

I opened a PR at #68540

cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Feb 10, 2021
Add a note about the changes in the `fields` API around the response format for
nested fields.

Relates to elastic#67432
@cbuescher
Copy link
Member Author

@jtibshirani I also added a note to the migration docs in #68813 as we discussed, maybe you can take a quick look

cbuescher pushed a commit that referenced this pull request Feb 11, 2021
Add a note about the changes in the `fields` API around the response format for
nested fields.

Relates to #67432
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Feb 11, 2021
This change adds tests around the handling of mixed object and dot notation in
document source when using the `fields` API with nested fields left out
of elastic#67432. After merging elastic#68540, this test can now be added.

Relates to elastic#67432
cbuescher pushed a commit that referenced this pull request Feb 11, 2021
This change adds tests around the handling of mixed object and dot notation in
document source when using the `fields` API with nested fields left out
of #67432. After merging #68540, this test can now be added.

Relates to #67432
cbuescher pushed a commit that referenced this pull request Feb 11, 2021
This change adds tests around the handling of mixed object and dot notation in
document source when using the `fields` API with nested fields left out
of #67432. After merging #68540, this test can now be added.

Relates to #67432
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v7.12.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants