-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec more granular metrics + data collection for fast verification per orchestrator #2336
Comments
For the first two - I know we've had trouble with per orchestrator metrics with Prometheus in the past due to Prometheus not liking high cardinality data labels so we likely will need to take that into account. @victorges @iameli Any thoughts here on the best way to collect per orchestrator metrics these days? Could https://github.com/livepeer/livepeer-data help? |
So now we are actually using Victoria Metrics instead of Prometheus, which does have support for higher cardinality metrics! @tqian1 is adding support for per-stream metrics in the go-livepeer code, the only problem with that AFAIK (and correct me if I'm wrong) are the actual metrics exporting libraries themselves like OpenCensus or Prometheus, since once we measure one stream they stay in memory forever and there is no clear way to delete them, so the list of metrics only grows forever. Mist fixes this by having their own implementation of a metrics library which only keeps a metric while a stream is currently active. As for exporting metrics per orchestrator tho, not necessarily per stream, we should be really fine! Maybe even in Prometheus, since metrics are always broken down by instance anyway with all the pod labels. It would also be interesting to integrate that data into |
Thanks for the mention @victorges Victor summarizes the problem well - the main limitation for per-stream metrics is the library (OpenCensus) that we use in go-livepeer. It does not have a clean way to register and deregister stream-specific tags like I was about to create a new issue in the repo to go over an RFC doc I wrote around this, but I can perhaps I can piggyback on this one if that's alright: https://www.notion.so/livepeer/Single-Stream-Health-OpenCensus-b402a38f5cf54479a7ad9845c57e5604 If anyone has a chance please help to review this doc - looking for some feedback on Option 1 and 2 to move forward with the per-stream metrics and establish some pattern around exporting high-cardinality or "leaky" metrics which do not clean themselves up Also looking for some feedback on this PR please #2313 - it separates some of the current metrics into the original view, and a per-stream view, in order to reduce dimensionality of exported metrics overall. Doesn't address the "leakiness" problem but makes the cardinality more manageable. |
cc @ArcAster possibly interesting for your work |
Thanks for the info @oscar-davids . So, if I understand correctly, we added the "per stream" metrics, but we actually didn't solve the issue with OpenCensus and high cardinality. So the current state of art is that if you enable flag |
I mean that we can know the total success rate for fast verification per broadcast but don't know the rate per orchestrator currently, so we need to collect matrix per orchestrator. |
The text was updated successfully, but these errors were encountered: