Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

querier: Federated Thanos Targets Page #1375

Closed
d-ulyanov opened this issue Aug 6, 2019 · 37 comments
Closed

querier: Federated Thanos Targets Page #1375

d-ulyanov opened this issue Aug 6, 2019 · 37 comments

Comments

@d-ulyanov
Copy link
Contributor

Hey, Thanos team :)

It would be great to have Prometheus-like "targets" page but from all connected Prometheuses.
From time to time it's pretty inconvenient to spend time finding which Prom scrapes some target, especially if your Prom setup is big (we have ~30 instances).
It would be much easier to filter targets on a special page instead of writing query up{service=~...}.

What do you think, guys?

@GiedriusS
Copy link
Member

GiedriusS commented Aug 6, 2019

My 2 cents: I would love this feature but only if Prometheus had an API which exposed all of this information. AFAICT it doesn't exist ATM. I wouldn't want to attach ourselves to the up metric because:

  • Its name might change
  • It does not expose labels which were on the target before relabelling such as __scheme__.

WDYT?

@2nick
Copy link
Contributor

2nick commented Aug 7, 2019

The documentation describes api endpoint to get information about targets https://prometheus.io/docs/prometheus/latest/querying/api/#targets which expose values for both types of labels - discovered and formed by job relabeling.

@GiedriusS
Copy link
Member

@2nick thanks for looking into it (I haven't when I wrote the original comment)! Then we could probably make a new method in the StoreAPI which returns these (if any exist) and Thanos Query could have this information, and show it in the web UI.

@bwplotka
Copy link
Member

bwplotka commented Aug 8, 2019

Nice ideas!

So the use case you provided is:

it's pretty inconvenient to spend time finding which Prom scrapes some target, especially if your Prom setup is big (we have ~ 30 instances).
It would be much easier to filter targets on a special page instead of writing query up{service=~...}.

Why query is not acceptable? What about having Grafana dashboard for this? (:

Then we could probably make a new method in the StoreAPI which returns these (if any exist) and Thanos Query could have this information, and show it in the web UI.

Disagree - StoreAPI is a generic form of getting metric data in an efficient way. Targets strictly relate to pulling based collectors/scrapers. This means that for now only Prometheus would be implementing this (or Thanos sidecar). I think if anything, it will have to be another gRPC service.

However, I mentioned some solution - Grafana dashboard (: As you can track every metric to the Prometheus/Metric source thanks to external labels. Would that solve your use case @d-ulyanov ?

@2nick
Copy link
Contributor

2nick commented Aug 9, 2019

@bwplotka

Disagree - StoreAPI is a generic form of getting metric data in an efficient way. Targets strictly relate to pulling based collectors/scrapers. This means that for now only Prometheus would be implementing this (or Thanos sidecar). I think if anything, it will have to be another gRPC service.

So you'll be OK if such functionality will be implemented as part of Thanos sidecar but as particular gRPC service with it's own API (maybe smth like "PrometheusProxyAPI")?

Because it does not look like that is possible to get "labels before relabeling" using only metrics data and this info is really valuable for debugging some types of issues/investigations.

@bwplotka
Copy link
Member

bwplotka commented Aug 9, 2019

Because it does not look like that is possible to get "labels before relabeling"

maybe, but do you need this? You only need to find out the correct Prometheus that was the source and compose a link to target page of it. All of this can be done on Grafana dashboard, including HTML link.

@brancz
Copy link
Member

brancz commented Aug 9, 2019

I have mixed feelings about this. I feel a combination of Prometheus providing more/better logs about scrapes in combination with the up metric might be a better fit than a "federation" approach. At the end of the day if you need information from the edge, then the edge is likely to be your best source :)

@2nick
Copy link
Contributor

2nick commented Aug 9, 2019

Yes, we are actively using additionalScrapeConfigs to add jobs for non-k8s targets with custom relabelings/certificates/other options which can't be configured with ServiceMonitor and it's really useful information for debugging some cases.

Also we can implement better targets page to have a more convenient way to work with targets then browser search-by-page (like rich UI with nice seach/filter-by-labels/labels view) and we think that Thanos query can be a good choice to make a single place to ease understanding of infrastructure and investigating issues for really complex prometheus installations.

For example at the moment we have like 4 environments with role-based (5 roles), sharded (each role has 4 shards) and replicated (each shard has it's own replica) Prometheus instances to collect not only k8s applications but also some software which is installed on bare metal servers (win/*nix). :)

@d-ulyanov
Copy link
Contributor Author

d-ulyanov commented Aug 9, 2019

@bwplotka @brancz @GiedriusS thanks for your replies, guys.
Targets page could be cool for ordinary users (who just using querying metrics, not setting up Thanos :)) because switching from Prometheus to Thanos would seem more seamless for them.
In spite of this, I agree that StoreAPI won't be a good place for such logic and we could add more

Grafana dashboard is a disputable solution here because in this case we should explain to our users where they should go to check their targets :)

Additional GRPC service sounds reasonable for us, WDYT @bwplotka ?

@bwplotka
Copy link
Member

bwplotka commented Aug 9, 2019

Thanks. Especially the input for the use cases and user experience is quite useful!

Grafana dashboard is a disputable solution here because in this case we should explain to our users where they should go to check their targets :)

Before Thanos, you needed to do that as well right? E.g `hey, it seems like this metric is from cluster=XYZ. So please go to "http://prometheus.xyz.example.com:9090/targets" to see what's wrong with your scrape configuration?

We are not saying "no" but we are trying to understand first what's the best user and operator experience here. For example:

  • Targets and new Service Discovery pages are evolving in Prometheus. Our API would need to be maintained for that.
  • Thanos can be extended to support any other "scraper" or "collector" with StoreAPI and TSDB format of blocks. What if someone adds some other source of metrics? One of the sources is also https://thanos.io/components/rule.md/ - it is producing some metrics for recording rules and ALERTS{...}. We need to be careful here as such produced metric is affected by dependent metric (e.g from some Prometheus), so it's quite complex here (: In fact, the same for potential Grafana dashboard.

Need to think about it more.

What if Store page would have links user can customize that will forward to Prometheus UIs?

@d-ulyanov
Copy link
Contributor Author

Okay, I agree with @bwplotka, Targets page adds pretty controversial Prometheus functionality to Thanos. Let's close this issue at the moment, we'll use the dashboard or smth like that.
Thanks all for your comments!

@bwplotka bwplotka reopened this Mar 11, 2020
@bwplotka
Copy link
Member

Targets page adds pretty controversial Prometheus functionality to Thanos.

cc @d-ulyanov

While it being controversial, we actually started work on Federated Rules API #2200 (Proposal to come! cc @s-urbaniak), so might want to like on Federated Targets as well in similar way ❤️

@bwplotka bwplotka changed the title Thanos targets page querier: Federated Thanos Targets Page Mar 11, 2020
@stale
Copy link

stale bot commented Apr 10, 2020

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

@stale stale bot added the stale label Apr 10, 2020
@bwplotka bwplotka removed the stale label Apr 10, 2020
@stale stale bot closed this as completed Apr 17, 2020
@bwplotka bwplotka reopened this Apr 17, 2020
@stale
Copy link

stale bot commented May 17, 2020

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label May 17, 2020
@bwplotka
Copy link
Member

bwplotka commented May 18, 2020 via email

@stale stale bot removed the stale label May 18, 2020
@brancz
Copy link
Member

brancz commented May 18, 2020

fwiw I've also come around to agree that with deduping functionality this actually fits pretty well in Thanos

@stale stale bot added the stale label Dec 20, 2020
@yeya24
Copy link
Contributor

yeya24 commented Dec 20, 2020

#3350

@stale stale bot removed the stale label Dec 20, 2020
@stale
Copy link

stale bot commented Feb 21, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@bwplotka
Copy link
Member

PR is almost there, #3350

2nick added a commit to 2nick/thanos that referenced this issue Mar 16, 2021
2nick added a commit to 2nick/thanos that referenced this issue Mar 16, 2021
2nick added a commit to 2nick/thanos that referenced this issue Mar 16, 2021
2nick added a commit to 2nick/thanos that referenced this issue Mar 22, 2021
2nick added a commit to 2nick/thanos that referenced this issue Mar 22, 2021
2nick added a commit to 2nick/thanos that referenced this issue Mar 22, 2021
2nick added a commit to 2nick/thanos that referenced this issue Mar 22, 2021
2nick added a commit to 2nick/thanos that referenced this issue Mar 30, 2021
2nick added a commit to 2nick/thanos that referenced this issue Mar 30, 2021
2nick added a commit to 2nick/thanos that referenced this issue Mar 30, 2021
2nick added a commit to 2nick/thanos that referenced this issue Mar 31, 2021
2nick added a commit to 2nick/thanos that referenced this issue Mar 31, 2021
2nick added a commit to 2nick/thanos that referenced this issue Mar 31, 2021
2nick added a commit to 2nick/thanos that referenced this issue Mar 31, 2021
bwplotka pushed a commit that referenced this issue Apr 1, 2021
@GiedriusS
Copy link
Member

We have the API, now it's just a matter of adding a new page to Thanos Query. I propose something like "Federated Targets" or something (:

@2nick
Copy link
Contributor

2nick commented Apr 5, 2021

@GiedriusS you won't believe, but in "New UI" targets page is fully functional :)

I tested my api with it, simply typing http://localhost:9090/new/targets

@onprem
Copy link
Member

onprem commented Apr 10, 2021

Haha cool. I am not that surprised tbh because it will work as long as we follow the Prometheus API (same route, same data format). Looks like the only work remaining is to add the link to Querier's nav bar. :)

@bwplotka
Copy link
Member

bwplotka commented Apr 11, 2021

Wooo! Amazing! 🤗

So what is left to do?

only work remaining is to add the link to Querier's nav bar. :)

Isn't that simpler to make React UI a new default? Which will solve this too?

@onprem
Copy link
Member

onprem commented Apr 11, 2021

Isn't that simpler to make React UI a new default? Which will solve this too?

By adding the link in querier's navbar I meant navbar in the new UI. We have the page working it's just that we are not advertising it. Created #4045 to fix this :)

@yeya24
Copy link
Contributor

yeya24 commented Nov 7, 2021

I believe this one is done. Close now.

@yeya24 yeya24 closed this as completed Nov 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants