Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename processor does not like fieldnames with dots in them #37507

Closed
ycombinator opened this issue Jan 15, 2019 · 3 comments
Closed

Rename processor does not like fieldnames with dots in them #37507

ycombinator opened this issue Jan 15, 2019 · 3 comments
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP discuss

Comments

@ycombinator
Copy link
Contributor

I'm not entirely sure if this is a bug or an enhancement or neither (working as designed). So marking this as discuss for now.

Consider this pipeline simulation:

POST _ingest/pipeline/_simulate
{
  "docs": [
    {
      "_source": {
        "foo.bar": 17
      }
    }],
    "pipeline": {
      "processors": [
        {
          "rename": {
            "field": "foo.bar",
            "target_field": "baz"
          }
        }
      ]
    }
}

This will return this error response:

{
  "docs" : [
    {
      "error" : {
        "root_cause" : [
          {
            "type" : "exception",
            "reason" : "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [foo.bar] doesn't exist",
            "header" : {
              "processor_type" : "rename"
            }
          }
        ],
        "type" : "exception",
        "reason" : "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [foo.bar] doesn't exist",
        "caused_by" : {
          "type" : "illegal_argument_exception",
          "reason" : "java.lang.IllegalArgumentException: field [foo.bar] doesn't exist",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "field [foo.bar] doesn't exist"
          }
        },
        "header" : {
          "processor_type" : "rename"
        }
      }
    }
  ]
}

However, if we first expand the foo.bar field using the dot_expander processor, it works as expected:

POST _ingest/pipeline/_simulate
{
  "docs": [
    {
      "_source": {
        "foo.bar": 17
      }
    }],
    "pipeline": {
      "processors": [
        {
          "dot_expander": {
            "field": "foo.bar"
          }
        },
        {
          "rename": {
            "field": "foo.bar",
            "target_field": "baz"
          }
        }
      ]
    }
}

Successful response:

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_type",
        "_id" : "_id",
        "_source" : {
          "baz" : 17,
          "foo" : { }
        },
        "_ingest" : {
          "timestamp" : "2019-01-15T22:42:20.90721Z"
        }
      }
    }
  ]
}
@ycombinator ycombinator added discuss :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels Jan 15, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@jakelandis
Copy link
Contributor

@ycombinator This is expected behavior in the example you provided (for any processor, not just rename). The dot processor exists to help disambiguate (and handle) dot notation vs. nested object notation (the latter is the default)

For example:

      "_source": {
        "foo.bar": 17,
        "foo" : {
          "bar" : 18
        }
      }
    }]

In a processor, when you reference "foo.bar" do you mean the value of 17 or 18 ? With the dot processor (by default) you get an array of both [17,18], with out the dot processor, it will default to only reading the nested object.

Happy to discuss this further, but in the context of the issue logged (rename processor) I will close this issue. Feel free to re-open or open a new issue with ideas on how to improve the experience here.

@ycombinator
Copy link
Contributor Author

ycombinator commented Jan 16, 2019

Thanks for the explanation, @jakelandis. The current behavior makes sense, given the ambiguity problem.

One side-effect in terms of user experience is that we end up needing a lot of dot_expander processors, something that could be alleviated by #36950.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP discuss
Projects
None yet
Development

No branches or pull requests

3 participants