Replies: 2 comments 1 reply
-
Why do you have samples that are scraped only by the standby prometheus? I
think the idea is to have 2 prometheus instances scraping exactly the same
metrics.
The reason why we only accept 1 replica is because TSDB will reject
duplicate samples (metrics with the same timestamp for the same series with
different values).
Alan Diego
…On Mon, Nov 6, 2023 at 1:08 AM aleskxyz ***@***.***> wrote:
Hi,
Cortext selects a leader from the cluster of HA Prometheus to retrieve
samples. Imagine a network partition situation where each Prometheus can
scrape data from some instances. With the current Cortext design, only
samples from the elected Prometheus will be written to long-term storage,
and samples from other Prometheuses will be discarded, resulting in gaps
for samples that are scraped only by the standby Prometheus.
Does Cortext have a solution for this, or can it handle this situation
like Thanos, which deduplicates data at query time?
Thanks.
—
Reply to this email directly, view it on GitHub
<#5633>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA6XK4DPC7WEQ4D4SARLFK3YDCSJVAVCNFSM6AAAAAA67E7VSCVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZVHAYTQMBQGI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
1 reply
-
So the problem is the "fail over time"?
The default value is 15 seconds its configurable: ha_tracker_update_timeout
Alan Diego
…On Mon, Nov 6, 2023 at 10:33 AM aleskxyz ***@***.***> wrote:
Thanks for your reply!
As I told above, we may see this inconsistency in case of network
partition.
Imagine we have 2 prometheus in 2 different racks that both of them are
scraping all instances.
when internal connection between 2 racks is disrupted, then the active
prometheus cannot scrape resources in the other rack but the local
prometheus of that rack is still working.
Thanks
—
Reply to this email directly, view it on GitHub
<#5633 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA6XK4FQKIV477GPNERF4D3YDEUQTAVCNFSM6AAAAAA67E7VSCVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TIOJQHEYTE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
Cortext selects a leader from the cluster of HA Prometheus to retrieve samples. Imagine a network partition situation where each Prometheus can scrape data from some instances. With the current Cortext design, only samples from the elected Prometheus will be written to long-term storage, and samples from other Prometheuses will be discarded, resulting in gaps for samples that are scraped only by the standby Prometheus.
Does Cortext have a solution for this, or can it handle this situation like Thanos, which deduplicates data at query time?
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions