Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to initialize Fleet - incorrect mappings applied to internal transform indices - WITH FIX #97367

Closed
deepybee opened this issue Apr 16, 2021 · 9 comments
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Team:Defend Workflows “EDR Workflows” sub-team of Security Solution Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@deepybee
Copy link

Kibana version: 7.12.0

Elasticsearch version: 7.12.0

Server OS version: Ubuntu 18.04.5 LTS

Browser version: Google Chrome Version 89.0.4389.114 (Official Build) (x86_64)

Browser OS version: macOS Catalina 10.15.7

Original install method (e.g. download page, yum, from source, etc.): apt

Describe the bug: After upgrade from an earlier version (presumed 7.11.x but I am uncertain the last time I accessed Fleet on this cluster) Fleet fails with the following error. I also hit #90984 and it looks like the same bug regarding ordering of upgrades may well have caused this too.

This symptom is caused by a background transform index being created with default mappings. When Fleet tries to start it attempts to execute the transform but Transforms cannot load the transform document because it expects id to be a keyword but per default mapping it is indexed as text with a subfield of .keyword which Transforms neither expects to be present nor attempts to parse.

NB: I have already implemented the workaround in that issue which is probably a dependency to ensuring a working Fleet once the workaround here is actioned.

Steps to reproduce:

  1. Upgrade Elasticsearch and Kibana to 7.12.0 from an earlier version
  2. Log into Kibana
  3. Open Fleet

Expected behavior:

Fleet opens and no errors are thrown

Screenshots (if relevant):

image

Errors in browser console (if relevant): 400 Bad Request error with JSON payload as per above screenshot

Provide logs and/or server output (if relevant):

Any additional context:

Buried in that response is the offending index which is - at least on this cluster - .transform-internal-005. This index has picked up a default mapping somewhere and actually breaks Transforms completely as well as Fleet:

GET _cat/transforms?v
{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [id] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : ".transform-internal-005",
        "node" : "aM8jyf3eQqKJ9q1qHxkGuA",
        "reason" : {
          "type" : "illegal_argument_exception",
          "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [id] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
        }
      }
    ],
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [id] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
      "caused_by" : {
        "type" : "illegal_argument_exception",
        "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [id] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
      }
    }
  },
  "status" : 400
}

Checking the mapping on the reported index confirms it is using a default mapping:

GET .transform-internal-005/_mapping
#! this request accesses system indices: [.transform-internal-005], but in a future major version, direct access to system indices will be prevented by default
{
  ".transform-internal-005" : {
    "mappings" : {
      "properties" : {
        "create_time" : {
          "type" : "long"
        },
        "description" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "dest" : {
          "properties" : {
            "index" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },

...

Fix

NB This workaround requires at least 1 node in the cluster to have the transform role

  1. Use a reindex to backup the erroneously mapped index's data
POST _reindex
{
  "source": {
    "index": ".transform-internal-005"
  },
  "dest": {
    "index": "broken-transform-data-backup"
  }
}
  1. Delete the broken index
DELETE .transform-internal-005
  1. The Transforms page should now load correctly and show the updated transform version (0.18.0 on Stack version 7.12)
    image

  2. Reload the Fleet page. After a few moments the background transform will have finished and Fleet will load as expected:

image

  1. If all went well, delete the backup.
DELETE broken-transform-data-backup
@deepybee deepybee added the bug Fixes for quality problems that affect the customer experience label Apr 16, 2021
@botelastic botelastic bot added the needs-team Issues missing a team label label Apr 16, 2021
@EricDavisX
Copy link
Contributor

@kevinlog and @nchaulet it looks like some upgrade problems can still be hit wrt to the Transforms.

@kevinlog
Copy link
Contributor

@EricDavisX this is another set up which required that at least node is set to handle transforms. We've got a DOCS ticket (7.13) to more clearly document this requirement: elastic/security-docs#608

In addition, this PR: #95649 will add more finer grained error in Fleet setup

@kevinlog kevinlog added the Team:Defend Workflows “EDR Workflows” sub-team of Security Solution label Apr 16, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-onboarding-and-lifecycle-mgt (Team:Onboarding and Lifecycle Mgt)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Apr 16, 2021
@kevinlog kevinlog added needs-team Issues missing a team label Team:Fleet Team label for Observability Data Collection Fleet team labels Apr 16, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Apr 16, 2021
@kevinlog kevinlog self-assigned this Apr 16, 2021
@EricDavisX
Copy link
Contributor

I explicitly thought that @deepybee was saying the fix of having a node setup didn't work. I may have mis-understood - glad to be, if that is the case.

@deepybee
Copy link
Author

@EricDavisX The fix / workaround definitely does work, but it has a dependency on there being a node in the cluster that has the transform role.

In practice, this should present as a fairly low risk footprint as any small uniform cluster will have the role implicitly enabled on all nodes, meaning the note is only applicable to environments where the cluster has nodes with roles explicitly set and none of them have transform in node.roles.

@deepybee
Copy link
Author

In fact, now I think about it it would be a very small footprint as I guess it would require a cluster where the transform role was actually removed after having been present for an earlier version of Fleet to work in the first place (which is an edge case I hit on this test cluster).

Aside from migrating from uniform to dedicated node types, I can't think of any scenario likely to affect a live cluster atm.

@EricDavisX
Copy link
Contributor

That's great news - thanks for the extra notes and thought @deepybee . Cheers

@jlind23
Copy link
Contributor

jlind23 commented Apr 4, 2023

Closing as not relevant anymore. Feel free to reopen if needed

@jlind23 jlind23 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Team:Defend Workflows “EDR Workflows” sub-team of Security Solution Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests

5 participants