Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metricbeat: prometheus module and de-duplication support #20430

Closed
Xat59 opened this issue Aug 4, 2020 · 3 comments
Closed

metricbeat: prometheus module and de-duplication support #20430

Xat59 opened this issue Aug 4, 2020 · 3 comments
Assignees
Labels
enhancement Metricbeat Metricbeat Team:Platforms Label for the Integrations - Platforms team

Comments

@Xat59
Copy link

Xat59 commented Aug 4, 2020

Hello guys,

I was trying to implement the solution provided on your blog by @sorantis : https://www.elastic.co/blog/prometheus-monitoring-at-scale-with-the-elastic-stack

It works fine but when you want to make your prometheus highly-available by adding a new prometheus instance, it induces the data to be stored twice into elasticsearch.
It could be nice to add a feature to deduplicate and merge the data before storing it into elasticsearch.

Thanos does the de-duplication and merge, but does not store data into elasticsearch.

Thank a lot !
Regards.

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Aug 4, 2020
@sorantis sorantis added enhancement Metricbeat Metricbeat Team:Platforms Label for the Integrations - Platforms team labels Aug 4, 2020
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Aug 4, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-platforms (Team:Platforms)

@masci masci self-assigned this Aug 4, 2020
@masci
Copy link

masci commented Sep 2, 2020

Hi @Xat59, thanks for reporting!

I finally had some time to explore Thanos but I see deduplication happens in the Query component thus they suffer from the same problem of storing duplicated data. I see there was a recent attempt at making the Compactor component to do offline deduplication (this would speed up queries and save storage) but it's stalling.

I'm afraid there isn't a simple workaround for this as deduplication should happen in some component ingesting (and possibly caching) all the samples from all the HA nodes and decide what to drop but happy to keep this open and discuss possible implementation strategies!

@masci
Copy link

masci commented Sep 22, 2020

Talked about this internally, we agreed that given the complexity we won't implement this feature in beats anytime soon but using a script processor might be a good solution to dedupe at ingest time.

@masci masci closed this as completed Sep 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Metricbeat Metricbeat Team:Platforms Label for the Integrations - Platforms team
Projects
None yet
Development

No branches or pull requests

4 participants