Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] dynamic_template mappings for wildcard field names are not installed #129344

Closed
joshdover opened this issue Apr 4, 2022 · 29 comments · Fixed by #137978
Closed

[Fleet] dynamic_template mappings for wildcard field names are not installed #129344

joshdover opened this issue Apr 4, 2022 · 29 comments · Fixed by #137978
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:EPM Fleet team's Elastic Package Manager (aka Integrations) project Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@joshdover
Copy link
Contributor

joshdover commented Apr 4, 2022

Integration packages have support for configuring wildcard field names in data streams. These fields are supposed to have an associated dynamic_template added to their mappings, however today, Fleet does not generate mappings for these. AFAICT, we have never had support for this in Fleet EPM code.

From @jsoriano:

When a field has a wildcard in the name, it must be installed in the template as a dynamic mapping. object_type would be the type that the dynamic mapping should produce (mapping.type). object_type_mapping_type is the type that the field in the document has to have to match the dynamic mapping (match_mapping_type), a wildcard there would mean any type.

Here's an example of how this works in Beats. A field defined like this:

    - name: prometheus.*.histogram
      type: object
      object_type: histogram
      object_type_mapping_type: "*"

Results in a dynamic template added to the mappings like this:

{
  ...
  "template": {
    "mappings": {
      ...
      "dynamic_templates": [
        ...
        {
          "prometheus.*.histogram": {
            "mapping": {
              "type": "histogram"
            },
            "match_mapping_type": "*",
            "path_match": "prometheus.*.histogram"
          }
        },
  ...
}

We should be able to follow the Beats implementation to make this match on Fleet: https://github.com/elastic/beats/blob/ea207346d651448b8917b0791b2b117b9f9b9212/libbeat/template/processor.go#L444

We have several integrations using this feature today, but it's likely broken for customers due to this problem. In my rudimentary search on the integrations repo, I found that this is affecting these integrations:

  • aws
  • azure_application_insights
  • azure_billing
  • azure_metrics
  • cockroachdb
  • containerd
  • kubernetes
  • prometheus
  • synthetics
@joshdover joshdover added bug Fixes for quality problems that affect the customer experience Feature:EPM Fleet team's Elastic Package Manager (aka Integrations) project Team:Fleet Team label for Observability Data Collection Fleet team labels Apr 4, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Feature:EPM)

@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@joshdover
Copy link
Contributor Author

We should consider backporting a fix for this to 7.17.x depending on impact

@jsoriano
Copy link
Member

jsoriano commented Apr 4, 2022

Actually, looking to beats mappings, the field doesn't need to have a wildcard. A field with object_type and without wildcards is assumed to have the wildcard at the end:

  - name: labels
    type: object
    object_type: keyword

Is installed as:

        {
          "labels": {
            "mapping": {
              "type": "keyword"
            },
            "match_mapping_type": "string",
            "path_match": "labels.*"
          }
        },

+1 to backport this to 7.17.

@jsoriano
Copy link
Member

jsoriano commented Apr 4, 2022

Another case that can produce dynamic mappings, the use of the dynamic_template flag can configure a normal mapping as dynamic instead: elastic/package-spec#171

@jsoriano
Copy link
Member

jsoriano commented Apr 4, 2022

Another case that can produce dynamic mappings, the use of the dynamic_template flag can configure a normal mapping as dynamic instead: elastic/package-spec#171

Ah no sorry, this feature was reverted (elastic/package-spec#212, #102289 (comment)).

@joshdover
Copy link
Contributor Author

@jsoriano What might be helpful is to have the Beats source code to look at. Do you know where this lives?

@joshdover joshdover changed the title [Fleet] Add support for adding dynamic_template mappings for wildcard field names [Fleet] dynamic_template mappings for wildcard field names are not installed Apr 4, 2022
@joshdover
Copy link
Contributor Author

Changed the wording to be more clear that this is really a bug

@joshdover
Copy link
Contributor Author

@elastic/integrations It would be helpful to understand the impact of this issue on the integrations listed above so that we can prioritize appropriately.

@jsoriano
Copy link
Member

jsoriano commented Apr 4, 2022

@jsoriano What might be helpful is to have the Beats source code to look at. Do you know where this lives?

Here it is: https://github.com/elastic/beats/blob/ea207346d651448b8917b0791b2b117b9f9b9212/libbeat/template/processor.go#L444

@jsoriano
Copy link
Member

jsoriano commented Apr 4, 2022

Regarding impact, we have to take into account that these fields will be used as if they had no mapping, what uses to be a source of problems.

I think we are not seeing more issues related to this because the most visible issue of this is mappings conflict, and looking to the default rules, this will be only problematic if:

  • A number parses as integer, but later this field stores floats (or histograms). This is probably the most likely visible issue, but would only affect prometheus at this point (and cockroachdb, that is heavily based on prometheus).
  • A string parses as integer or date, but later this field stores strings with other formats. This is quite unlikely, but hard to debug if happens.

Apart of the mapping conflicts, there may be more subtle consequences of not having proper mappings, at least:

  • Performance impact, as fields are not installed with the expected settings (for example default mappings for strings index the field both as text and as keyword multi-field, what can increase storage size).
  • Developer experience: for example a developer tries to query or visualize a field as a histogram, but it is being mapped as two float fields instead of a histogram.

@andrewkroh
Copy link
Member

I think we need to go through the process to update the package-spec schema for fields.yml files. This way we have a clear definition of the expected behavior of the options. This will help both the developers of integrations and implementors of the specification in Fleet.

(I've started updating the specification to match what the code actually does in Fleet EPM today by looking at the typescript code. andrewkroh/package-spec@91d6513)

@joshdover
Copy link
Contributor Author

(I've started updating the specification to match what the code actually does in Fleet EPM today by looking at the typescript code. andrewkroh/package-spec@91d6513)

Thank you, @andrewkroh. Agreed, we need to update the spec to match. Happy to have Fleet UI team help review once you have a PR.

@jsoriano
Copy link
Member

jsoriano commented Apr 5, 2022

(I've started updating the specification to match what the code actually does in Fleet EPM today by looking at the typescript code. andrewkroh/package-spec@91d6513)

Thanks @andrewkroh! Could you open a PR with this change so we can continue the discussion there?

In any case we need to support dynamic templates, and this brings the need of wildcards in names and supporting the object_type options.

@andrewkroh
Copy link
Member

Here's that change as a PR to package-spec elastic/package-spec#314. The goal there will be to sync up the spec to the current implementation. Then we can do a separate PR propose how dynamic mappings should work.

@ritalwar
Copy link
Contributor

ritalwar commented Jul 4, 2022

Hi,
I am working on moving Cockroachdb integration to GA and in process of that I am getting error “field cannot be changed from type [float] to [long], dropping event!”. I found related SDH https://github.com/elastic/sdh-beats/issues/1809 which is similar to the issue am getting,
Could you please confirm if this issue (dynamic template mapping) will resolve all such conversion errors and when is the fix expected?

@jsoriano
Copy link
Member

jsoriano commented Jul 4, 2022

Could you please confirm if this issue (dynamic template mapping) will resolve all such conversion errors and when is the fix expected?

Probably yes, CockroachDB package provides fields definitions (see here), but most of them are dynamic mappings, that are not being properly installed.

@andresrc
Copy link

andresrc commented Jul 5, 2022

@joshdover @jlind23 do we have a timeline for this issue?

@jsoriano
Copy link
Member

jsoriano commented Jul 5, 2022

how dynamic mappings should work.

Regarding this, I think they should work the same way as in Beats. There may be limitations, but there are also strong advantages:

  • It is a proven model we have been using for long in Beats.
  • We have been actually assuming that this is how dynamic mappings work with Fleet. It is supported by the package spec and elastic-package and many integrations already define dynamic mappings this way, as the mentioned case of CockroachDB.

@jen-huang
Copy link
Contributor

@andresrc This is a Fleet issue that hasn't been bubbled up in priority yet. Is this blocking some packages going GA? (like Cockroach DB?)

@andresrc
Copy link

andresrc commented Jul 6, 2022

@jen-huang yes, the team is working on CockroachDB in this iteration, thanks

@jen-huang
Copy link
Contributor

Bumped up to P1 per discussion with @akshay-saraswat as it is blocking CockroachDB integration from going GA.

@jsoriano
Copy link
Member

jsoriano commented Jul 28, 2022

Bumped up to P1 per discussion with @akshay-saraswat as it is blocking CockroachDB integration from going GA.

Thanks for prioritizing this!

Take into account that this is not only about CockroachDB. Grepping around integrations, many packages have at least one field with type: object (119), object_type (110) or a name with a wildcard (20). All of them should result on dynamic mappings, and they are not now, this can lead to unexpected mapping conflicts.
I guess we are not seeing more issues caused by this because we are being lucky on the use we do on these fields and default automatic mappings are working well for us (#129344 (comment)), but it may happen anytime that a customer faces issues because of this on corner cases or new integrations.

@crespocarlos
Copy link
Contributor

crespocarlos commented Jul 29, 2022

I've seen the same error in the Prometheus package and I've added a dynamic template to the data_stream manually to fix it. If I understood correctly, by fixing what's been discussed here, we wouldn't need to create the dynamic mapping at the package level, right?

@tetianakravchenko
Copy link
Contributor

related to the question above: would it be safe to add a dynamic_templates to the data_stream manually, go to GA (prometheus integration is planned to go GA in 8.5) and remove dynamic_templates later?
any timelines on the fix?

@nchaulet
Copy link
Member

nchaulet commented Aug 1, 2022

Adding the dynamic_templates to the datastreams will work for now, but it will create a little more work when we generate mapping and implement dynamic_templates for wildcard field names as we will have to merge two source of dynamic_templates (but I guess we will have to do so as some integrations are already using this)

any timelines on the fix

I am currently looking at this and it's seems not it's not a 1 for 1 change from the beats code to Fleet as we generate mappings a little differently, but I should be able to have a POC/draft PR early this. This change seems a little risky to me as it will impact a lot of package and we probably need to decide if it's a bug fix for 8.4 or an improvement in 8.5 cc @jen-huang

@jen-huang
Copy link
Contributor

@nchaulet Thanks for investigating. Let's continue working on the fix for this and then assess the risk from there. My feeling is that we may want to delay it to 8.5...

@nchaulet
Copy link
Member

nchaulet commented Aug 1, 2022

@jsoriano When you say we should create a dynamic template for field with a name with a wildcard it is a new feature? I am not able to the code path that does this in beats

@jsoriano
Copy link
Member

jsoriano commented Aug 2, 2022

@jsoriano When you say we should create a dynamic template for field with a name with a wildcard it is a new feature? I am not able to the code path that does this in beats

@nchaulet oh, you are right, I didn't notice. It only happens to be that almost all fields with a wildcard in their name also have type: object in beats. TLDR; Yes, this would be a new feature, but probably it is worth to do it.

I say almost because I have found this one in Metricbeat, that effectively doesn't generate a dynamic mapping:

      - name: pod.preemption.victims.bucket.*
        type: long
        description: Pod preemption victims

In integrations repo I see that there are more fileds with wildcards but without type: object, for example https://github.com/elastic/integrations/blob/main/packages/awsfargate/data_stream/task_stats/fields/fields.yml#L28-L32

        - name: core.*.pct
          type: scaled_float

Such field definitions both in beats and fleet generate static mappings with * as property names, that I don't think is what the developer expects.

I think we should do something about this, so developers have what they probably expect. I see these options:

  • Create dynamic mappings for fields with wildcards in their names. For example the previous core.*.pct field definition would be interpreted as if it had type: object and object_type: scaled_float, and would generate a dynamic mapping. This would be actually a new feature. It would support a more compact and intuitive syntax, and would support fields currently defined this way.
  • Disallow fields with wildcards in their names, unless they have type: object. This would be to ignore such fields in Fleet, and to add semantic validation in the package-spec to reject these fields. This would be disrupting for current integrations with such fields as validation would start failing for them and manual changes would be required.
  • Do nothing about this by now. Even when this doesn't produce what developers are probably, it hasn't seem to have caused issues so far.

From these options I would prefer the first one, so we would have a better syntax for many of these use cases, and it would support the integrations that are already using it.

Btw, while looking at this I have found that Beats generates both, the dynamic mapping and an static mapping with the wildcard, this is probably unexpected, I will open another issue in Beats about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:EPM Fleet team's Elastic Package Manager (aka Integrations) project Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants