Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup ignore_malformed in fleet #157184

Merged
merged 28 commits into from
May 18, 2023

Conversation

gsantoro
Copy link
Contributor

@gsantoro gsantoro commented May 9, 2023

Summary

Add default setting ignore_malformed: true to all datastream of type logs. Since the field @timestamp needs to have ignore_malformed: false in the mappings, I'm setting this setting automatically even if not defined in the integration.

I had to fix a bug that prevented index.mappings to be copied from the default settings into the template settings

Checklist

Delete any items that are not applicable to this PR.

For maintainers

…etting for field @timestamp and setting ignore_malformed:false. fix bug on merging index.mapping
@gsantoro gsantoro requested a review from a team as a code owner May 9, 2023 15:27
@botelastic botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label May 9, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@gsantoro
Copy link
Contributor Author

gsantoro commented May 9, 2023

I did some testing for this changes:

  1. Started Kibana from source and manually checked the component template for nginx integration contains the expected changes.
  2. I run the system tests (and all the other tests as well) for the nginx package. They all passed without any errors.

Here attached the index template generated by the system test both for logs and metrics

This index template for logs have the expected changes:

  • ignore_malformed: true under settings.index.mapping
  • the following mapping
"@timestamp": {
  "type": "date",
  "ignore_malformed": false
},

nginx-logs.txt

while this other index template for metrics has none of those changes

nginx-metrics.txt

@gsantoro
Copy link
Contributor Author

A quick note about something that I just noticed.

index template for metrics has ignore_malformed: false set for @timestamp field since we don't add this setting only for logs template but for all.

As discussed with @felixbarny this is ok for now since eventually this will be set by Elasticsearch in the future.

Copy link
Member

@kpollich kpollich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case I added in the other PR should cover this. Once that lands, let's merge upstream here and verify everything's working before we merge.

Code LGTM.

@ruflin
Copy link
Contributor

ruflin commented May 11, 2023

Where is this other PR with the test changes?

@gsantoro
Copy link
Contributor Author

@ruflin This is the linked PR with the fix for the bug and the related test #157289

@gsantoro gsantoro self-assigned this May 11, 2023
@gsantoro gsantoro added release_note:feature Makes this part of the condensed release notes v8.9.0 backport:skip This commit does not require backporting labels May 11, 2023
@gsantoro
Copy link
Contributor Author

hello @kpollich,
I have just fixed the unit tests. it required some code changes to account for ignore_malformed: true/false.
About the failing functional tests I have no idea why they are failing. It's not clear why I get a 500 instead of a 200. I could use some help.

@kpollich
Copy link
Member

Seems like the install_overrides tests are just a simple assertion issue: Link (see "logs" link in comment above)

I'll fire up the FTR suite for this PR and debug to see what's going on with the other failures.

@kpollich
Copy link
Member

The 400's in the package policy tests look like this

{
  "statusCode": 400,
  "error": "Bad Request",
  "message": "illegal_argument_exception\n\tCaused by:\n\t\tillegal_argument_exception: invalid composite mappings for [logs-logs]\n\tRoot causes:\n\t\tillegal_argument_exception: composable template [logs-logs] template after composition with component templates [logs-logs@package, logs-logs@custom, .fleet_globals-1, .fleet_agent_id_verification-1] is invalid"
}

Seems like the component template we're generating after this change is failing Elasticsearch's validation.

@gsantoro
Copy link
Contributor Author

Seems like the component template we're generating after this change is failing Elasticsearch's validation.

Is it possible that this PR depends from a future version of ES. I have tested it manually on Kibana from source and Elasticsearch 8.9.0-SNAPSHOT.

@kpollich
Copy link
Member

Testing this manually by installing the custom logs package + creating a policy seems to work as expected, so I'm working on debugging why this fails in FTR.

@kpollich
Copy link
Member

I was able to reproduce the issue present in tests

PUT _component_template/logs1-generic@package
{
  "template": {
    "settings": {
      "index": {
        "lifecycle": {
          "name": "logs"
        },
        "codec": "best_compression",
        "mapping": {
          "ignore_malformed": true,
          "total_fields": {
            "limit": "10000"
          }
        },
        "default_pipeline": "logs-generic-2.0.0"
      }
    },
    "mappings": {
      "properties": {
        "input": {
          "properties": {
            "name": {
              "type": "constant_keyword",
              "value": "logs"
            }
          }
        }
      }
    }
  },
  "_meta": {
    "managed_by": "fleet",
    "managed": true,
    "package": {
      "name": "integration_to_input"
    }
  }
}


PUT _component_template/logs1-generic@custom
{
  "template": {
    "settings": {}
  },
  "_meta": {
    "managed_by": "fleet",
    "managed": true,
    "package": {
      "name": "integration_to_input"
    }
  }
}

PUT _index_template/logs1-generic
{
  "priority": 200,
  "index_patterns": [
    "logs-generic-*"
  ],
  "template": {
    "settings": {
      "index": {}
    },
    "mappings": {
      "_meta": {
        "managed_by": "fleet",
        "managed": true,
        "package": {
          "name": "integration_to_input"
        }
      }
    }
  },
  "data_stream": {},
  "composed_of": [
    "logs1-generic@package",
    "logs1-generic@custom",
    ".fleet_globals-1",
    ".fleet_agent_id_verification-1"
  ],
  "_meta": {
    "managed_by": "fleet",
    "managed": true,
    "package": {
      "name": "integration_to_input"
    }
  }
}
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "composable template [logs1-generic] template after composition with component templates [logs1-generic@package, logs1-generic@custom, .fleet_globals-1, .fleet_agent_id_verification-1] is invalid"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "composable template [logs1-generic] template after composition with component templates [logs1-generic@package, logs1-generic@custom, .fleet_globals-1, .fleet_agent_id_verification-1] is invalid",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "invalid composite mappings for [logs1-generic]",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "data stream timestamp field [@timestamp] has disallowed [ignore_malformed] attribute specified"
      }
    }
  },
  "status": 400
}

Seems like this might be broken in the manual tests as well but maybe not as obviously.

@kpollich
Copy link
Member

@gsantoro - Does the error above look like something that's fixed in 8.9.0 SNAPSHOT for ES?

@kpollich
Copy link
Member

The FTR tests are in fact running on es-8.9.0 so maybe there's another issue here.

@gsantoro
Copy link
Contributor Author

Hello @kpollich,
thanks for investigating this. I expect that validation error has been fixed somewhere before 8.9.0-snapshot otherwise I would have seen it in my manual tests with integrations.

from a quick search I can find this PR from a while ago.

maybe @felixbarny knows more about this

@kpollich
Copy link
Member

kpollich commented May 11, 2023

Here's all the individual component templates at play here

[
  {
    "name": "logs1-generic@package",
    "component_template": {
      "template": {
        "settings": {
          "index": {
            "lifecycle": {
              "name": "logs"
            },
            "codec": "best_compression",
            "default_pipeline": "logs-generic-2.0.0",
            "mapping": {
              "total_fields": {
                "limit": "10000"
              },
              "ignore_malformed": "true"
            }
          }
        },
        "mappings": {
          "properties": {
            "input": {
              "properties": {
                "name": {
                  "type": "constant_keyword",
                  "value": "logs"
                }
              }
            }
          }
        }
      },
      "_meta": {
        "package": {
          "name": "integration_to_input"
        },
        "managed_by": "fleet",
        "managed": true
      }
    }
  },
  {
    "name": "logs1-generic@custom",
    "component_template": {
      "template": {
        "settings": {}
      },
      "_meta": {
        "package": {
          "name": "integration_to_input"
        },
        "managed_by": "fleet",
        "managed": true
      }
    }
  },
  {
    "name": ".fleet_globals-1",
    "component_template": {
      "template": {
        "settings": {},
        "mappings": {
          "_meta": {
            "managed_by": "fleet",
            "managed": true
          },
          "dynamic_templates": [
            {
              "strings_as_keyword": {
                "mapping": {
                  "ignore_above": 1024,
                  "type": "keyword"
                },
                "match_mapping_type": "string"
              }
            }
          ],
          "date_detection": false
        }
      },
      "_meta": {
        "managed_by": "fleet",
        "managed": true
      }
    }
  },
  {
    "name": ".fleet_agent_id_verification-1",
    "component_template": {
      "template": {
        "settings": {
          "index": {
            "final_pipeline": ".fleet_final_pipeline-1"
          }
        },
        "mappings": {
          "properties": {
            "event": {
              "properties": {
                "agent_id_status": {
                  "ignore_above": 1024,
                  "type": "keyword"
                },
                "ingested": {
                  "format": "strict_date_time_no_millis||strict_date_optional_time||epoch_millis",
                  "type": "date"
                }
              }
            }
          }
        }
      },
      "_meta": {
        "managed_by": "fleet",
        "managed": true
      }
    }
  },
]

Seems like the issue might be coming from having ignore_malformed set via the template default declared in logs1-generic@package rather than on an explicit @timestamp mapping? That's my guess right now.

EDIT: Removing the ignore_malformed setting from the @package component template does indeed result in the error going away.

@gsantoro
Copy link
Contributor Author

hey @kpollich about the docs for running the functional tests

  1. the docs at here at option 2 don't work for me. Option 3 suggest to use --config <config> but that looks like it is an optional field from the docs. When running my tests, that looks like a required config instead.
  2. from the previous points none of those options actually run correctly until you set the env variable FLEET_PACKAGE_REGISTRY_PORT=12345. This env variable is suggested at here and it is the only thing that makes those tests run successfully.
  3. I was successfully able to run a single test with --grep=<name of the test>

Thanks a lot. I'll definitively use this instructions in the future

@felixbarny
Copy link
Member

Until we add package spec support for this, it's not possible to set ignore_malformed via a package manifest. There's not currently an issue or plan to do this, but we could pursue it if needed.

@gsantoro didn't you add that feature? I don't think there's a rush to do that as no logging integrations seems to be using synthetic source by default at the moment. But it seems it would be useful to have down the line.

@kpollich
Copy link
Member

Ah we did add this! I must've missed this PR last month - sorry about that elastic/package-spec#498

So yeah I don't really have any concerns here then. Enabling synthetic source on a "default" log data stream will error, but it's better to have it error here then to be silently or unexpectedly broken. If a use case for synthetic source on a logging data stream emerges, a package maintainer can set ignored_malformed: true to that data stream manifest and all will work.

@gsantoro
Copy link
Contributor Author

can someone from @elastic/security-solution please take a look at this PR?

@kpollich
Copy link
Member

Having worked with this code and PR a bit more, I'm not sure we can accept this into Fleet as it stands. The key issues here is that ignore_malformed: false is - by nature - incompatible with synthetic _source and by extensions TSDB data streams. If we make ignore_malformed: true our global default for all fields except @timestamp, Fleet's functionality around synthetic _source and TSDB break.

I believe the ideal way for this to work would be: when TSDB or synthetic _source are enabled, the top level ignore_malformed: true setting on the @package in flipped to ignore_malformed: false.

@gsantoro - Do you think you'd be comfortable making that change in this PR with some guidance from the Fleet team? I don't think we should add the default ignore_malformed value to the defaultSettings in Fleet until this caveat is addressed.

@felixbarny
Copy link
Member

@kpollich this PR only sets ignore_malformed: true for logs data streams. TSDB is only relevant for metrics data streams. If an integration wants to use synthetic source in a logs data stream, that integration can explicitly set ignore_malformed: false.

@kpollich
Copy link
Member

If an integration wants to use synthetic source in a logs data stream, that integration can explicitly set ignore_malformed: false.

While this is true, this does require some non-obvious knowledge on the integration maintainer's part. I'd prefer not to burden integration maintainers in this way and instead rely on Fleet to always set the right combination of settings to make synthetic source or TSDB work. Eventually, it'd be nice if Elasticsearch did this for us, but I think pushing complexity away from package manifests and closer to Elasticsearch should be our overall intent here.

However, I realize this PR has been a bit long running and we might not want to accept this set back here. I'm happy to fix the failing tests here in this PR and move forward with the assumption that logs-* data streams aren't supported by TSDB/synthetic source, and that we have an escape hatch available should that change. It'd be good to capture this escape hatch in the integration docs somewhere.

@kpollich
Copy link
Member

Hmm it seems like elastic/package-spec#498 only introduces the ability to provide ignore_malformed at the field level. I suppose the escape hatch in this case would be to add ignore_malformed: false to every field defined by the integration?

@kpollich
Copy link
Member

Failing test should be fixed in 70f0ef3 - updated the TSDB rollover tests to use a metrics data stream to unblock. Same approach we took to the failing synthetic source tests earlier.

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Unknown metric groups

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
securitySolution 400 404 +4
total +6

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
securitySolution 480 484 +4
total +6

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @gsantoro

@felixbarny
Copy link
Member

It’s not that synthetic source is fundamentally incompatible with ignore_malformed. Currently, some field types are not supported in that combination but there’s a plan to add support. It seems like a bit too much magic to automatically toggle ignore_malformed based on whether synthetic source is set and to stop doing that once ES supports all field types with ignore_malformed and synthetic source.

Hmm it seems like elastic/package-spec#498 only introduces the ability to provide ignore_malformed at the field level. I suppose the escape hatch in this case would be to add ignore_malformed: false to every field defined by the integration?

Good point. I think we should add support for setting that on a per dataset level.

@kpollich
Copy link
Member

You're right that we would have to remove that logic from Fleet once the ES support is further along. Thanks for considering the future here, I didn't have the full picture of ES's long term plan.

With that in mind I think our current approach is the right one. This PR is good to merge from my end. Thanks everyone!

@gsantoro
Copy link
Contributor Author

can someone from @elastic/security-solution please take a look at this PR?

@gsantoro gsantoro merged commit c396cc6 into elastic:main May 18, 2023
@gsantoro gsantoro deleted the feature/fleet_ignore_malformed branch May 18, 2023 12:23
@andrewkroh
Copy link
Member

andrewkroh commented May 23, 2023

IIUC this introduces silent failures into integration packages. Developers need to be able to catch these. So I have opened elastic/elastic-package#1276 to request a way to detect this.

I think package-spec needs to be clarified to explain that ignore_malformed is the default for fields in logs data streams (except for @timestamp). https://github.com/elastic/package-spec/blob/20977bd5544c31dc6981685f2df6283e231b0692/spec/integration/data_stream/fields/fields.spec.yml#L291-L299

We should probably broadcast this change to fleet-contributors as well if that hasn't been done (may have missed it).

edit: Having telemetry about the number of unique _ignored fields in a data stream would be useful so we can be proactive about investigating.

@felixbarny
Copy link
Member

I have opened elastic/elastic-package#1276 to request a way to detect this.

++

I think package-spec needs to be clarified to explain that ignore_malformed is the default for fields in logs data streams (except for @timestamp). https://github.com/elastic/package-spec/blob/20977bd5544c31dc6981685f2df6283e231b0692/spec/integration/data_stream/fields/fields.spec.yml#L291-L299

I think we should add a ignore_malformed option on the dataset level so that logs data streams can opt-out of ignore_malformed and non-logs data streams can opt-in.

We should probably broadcast this change to fleet-contributors as well if that hasn't been done (may have missed it).

Do you have a suggestion on how to do that broadcast? Is there an existing mail group that we can use?

Having telemetry about the number of unique _ignored fields in a data stream would be useful so we can be proactive about investigating.

++
@jbaiera is working on adding metrics to data streams for rejected and degraded (docs with a _ignored field) documents

delanni pushed a commit to delanni/kibana that referenced this pull request May 25, 2023
## Summary

Add default setting `ignore_malformed: true` to all datastream of type
`logs`. Since the field `@timestamp` needs to have `ignore_malformed:
false` in the mappings, I'm setting this setting automatically even if
not defined in the integration.

I had to fix a bug that prevented index.mappings to be copied from the
default settings into the template settings

### Checklist

Delete any items that are not applicable to this PR.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures
(run axe in browser:
[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),
[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This renders correctly on smaller devices using a responsive
layout. (You can test this [in your
browser](https://www.browserstack.com/guide/responsive-testing-on-local-server))
- [ ] This was checked for [cross-browser
compatibility](https://www.elastic.co/support/matrix#matrix_browsers)


### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Kyle Pollich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:feature Makes this part of the condensed release notes Team:Fleet Team label for Observability Data Collection Fleet team v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.