Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Prometheus] [remote_write] Add dimention and metric_type metadata #7565

Merged

Conversation

tetianakravchenko
Copy link
Contributor

@tetianakravchenko tetianakravchenko commented Aug 28, 2023

What does this PR do?

  • add ingest pipeline to calculate the labels_fingerprint field that include the list of metrics to solve this issue
  • add dimensions
  • add metric_type - the same as for other datastreams

Follow up:

  • for consistency I am going to align labels_fingerprint in collector and query datastreams

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Screenshots

@elasticmachine
Copy link

elasticmachine commented Aug 28, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-08-29T09:46:54.587+0000

  • Duration: 14 min 12 sec

Test stats 🧪

Test Results
Failed 0
Passed 9
Skipped 0
Total 9

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@tetianakravchenko tetianakravchenko marked this pull request as ready for review August 28, 2023 14:48
@tetianakravchenko tetianakravchenko requested a review from a team as a code owner August 28, 2023 14:48
@@ -8,6 +8,7 @@
- name: account.id
level: extended
type: keyword
dimension: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a field can be a dimension if ignore_above is also present. Did the testing work?

Copy link
Contributor Author

@tetianakravchenko tetianakravchenko Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By testing do you mean the TSDB-migration-test-kit ?

You're testing with version 8.9.0.

Testing data stream metrics-prometheus.remote_write-default.
Index being used for the documents is .ds-metrics-prometheus.remote_write-default-2023.08.28-000001.
Index being used for the settings and mappings is .ds-metrics-prometheus.remote_write-default-2023.08.28-000001.

The time series fields for the TSDB index are: 
        - dimension (9 fields):
                - agent.id
                - cloud.account.id
                - cloud.availability_zone
                - cloud.instance.id
                - cloud.provider
                - cloud.region
                - container.id
                - host.name
                - prometheus.labels_fingerprint

the problem here is that I am running it locally, so those fields are empty, bu for other packages I believe we have the same definition of account.id

Have you seen any issue specifically with this field?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It happened to me before. I tried it again just to make sure and I get the following error:

elasticsearch.BadRequestError: BadRequestError(400, 'mapper_parsing_exception', 'Failed to parse mapping: Field [ignore_above] cannot be set in conjunction with field [time_series_dimension]')

And if we think about it, what if we have two fields that are over the limit 1024, but they are the exact same thing until that limit? Maybe it is best to create a fingerprint for it?

Copy link
Contributor Author

@tetianakravchenko tetianakravchenko Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those fields cloud.account.id and cloud.availability_zone contains ignore_above: 1024 by default - and it is defined in the ecs repo - https://github.com/elastic/ecs/blob/8.0/experimental/generated/beats/fields.ecs.yml#L474-L489
We have already set those fields as a dimensions for other packages. Specifically for those 2 fields: they are unlikely to exceed the limit, but the limit is defined anyway, might be a prevention action

I am wondering why you see this error now. What is the stack version you use? Maybe would be better to create a dedicated issue to track it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right.

I am testing with 8.9.0-SNAPSHOT, but I think it happened to me before when I was using 8.8.0-SNAPSHOT.

I had a look at the mapping (the data stream in question is kibana.stack_monitoring.cluster_actions, but this is irrelevant for this problem):

      "agent": {
        "properties": {
          "id": {
            "type": "keyword",
            "time_series_dimension": "true"
          }
        }
      },

And it works fine like this. But if I mark this explicitly with ignore_above:

      "agent": {
        "properties": {
          "id": {
            "type": "keyword",
            "ignore_above": 1024,
            "time_series_dimension": "true"
          }
        }
      },

The error occurs again:

elasticsearch.BadRequestError: BadRequestError(400, 'mapper_parsing_exception', 'Failed to parse mapping: Field [ignore_above] cannot be set in conjunction with field [time_series_dimension]')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@constanca-m have you created an issue for this? so we could continue discussion there? I think it is a generic issue, not prometheus package, or remote_write specific

@elasticmachine
Copy link

elasticmachine commented Aug 28, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (0/0) 💚
Files 100.0% (0/0) 💚
Classes 100.0% (0/0) 💚
Methods 66.667% (8/12) 👎 -6.667
Lines 100.0% (0/0) 💚
Conditionals 100.0% (0/0) 💚

changes:
- description: Add dimension and metric_type fields to remote_write datastream
type: enhancement
link: https://github.com/elastic/integrations/pull/7261
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR link needs to be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing it out! done - 3297b18

@tetianakravchenko tetianakravchenko merged commit fcc5971 into elastic:main Aug 29, 2023
1 check passed
@tetianakravchenko tetianakravchenko deleted the prometheus-remote_write-tsdb branch August 29, 2023 15:27
@elasticmachine
Copy link

Package prometheus - 1.9.0 containing this change is available at https://epr.elastic.co/search?package=prometheus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants