Per Orchestrator metrics in Broadcaster #2371

leszko · 2022-04-25T13:23:23Z

Is your feature request related to a problem? Please describe.
Currently, Livepeer Broadcaster doesn't export any metrics tagged per Orchestrator. This would be useful, for example, in test streams monitoring.

Describe the solution you'd like
Add per Orchestrator metrics recorded by Broadcaster:

transcode_overall_latency_seconds_per_orchestrator
segment_transcoded_all_appeared_total_per_orchestrator
segment_transcoded_unprocessed_total_per_orchestrator

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

The text was updated successfully, but these errors were encountered:

yondonfu · 2022-04-26T17:02:45Z

Would it make sense to add labels (i.e. an orchestrator label) for the existing metrics instead of having separate metrics with the _per_orchestrator suffix?

leszko · 2022-04-27T06:49:19Z

Would it make sense to add labels (i.e. an orchestrator label) for the existing metrics instead of having separate metrics with the _per_orchestrator suffix?

I think we could do it that way. I actually tried to followed the pattern we have for per_stream metrics. We do have both:

success_rate
success_rate_per_stream

@tqian1 do you see any drawbacks of just adding a label to existing metrics instead of creating new ones? Also do you maybe know why we added new ones for per_stream?

red-0ne · 2022-04-29T03:56:55Z

I've added the requested metrics in census.

The approach was to create the metric and insert a tag with key kOrchestratorURI. The issue now is that we need to get uri from somewhere since it's not passed by SegmentFullyTranscoded(). I was not sure if I can get that from ctx or need to pass the value all along the function calls.

red-0ne · 2022-05-03T07:48:00Z

According to the implementation of manifestWithTag it seems that it's possible to get keys from ctx. @tqian1 Do you think it's possible to get orchestrator's uri from there? Also, If I tag that uri in the transcode_overall_latency_seconds instead of ..._per_orchestrators Can I have both functionalities (metrics per and overall orchestrators) by looking up by kOrchestratorURI key?

figintern · 2022-05-05T18:54:44Z

@tqian1 do you see any drawbacks of just adding a label to existing metrics instead of creating new ones? Also do you maybe know why we added new ones for per_stream?

Sorry for delay here folks:
Adding a label to existing metrics is preferable to creating new ones

For context (and to clarify confusion), the reason we have independent metrics suffixed with per_stream on our side is because the high cardinality of manifest_id's combined with the high dimensionality of pre-existing Distribution view aggregation type increases the number of lines/metrics exported dramatically.

For metrics like http_client_segment_transcoded_realtime_ratio, the per_stream version contains the manifest_id tag and thus uses a single bucket view.Distribution() instead of multi-bucket ie. view.Distribution(0.5, 1, 2, 3, 5, 10, 50, 100)

This is not necessary for orchestrator metrics because the starting cardinality of kOrchestratorURI is not inherently high, and does not leak/grow as fast as manifest_id

figintern · 2022-05-05T19:01:17Z

The approach was to create the metric and insert a tag with key kOrchestratorURI. The issue now is that we need to get uri from somewhere since it's not passed by SegmentFullyTranscoded(). I was not sure if I can get that from ctx or need to pass the value all along the function calls.

@red-0ne Great approach and in-line with the existing pattern, I just realized we never merged this PR on our side - hope you were able to have a look at the pattern there: #2313

Do you think it's possible to get orchestrator's uri from there?

AFAICT it is possible to get orchestrator_uri from ctx were you able to pull it?

Also, If I tag that uri in the transcode_overall_latency_seconds instead of ..._per_orchestrators Can I have both functionalities (metrics per and overall orchestrators) by looking up by kOrchestratorURI key?

Correct, if you tag the URI in transcode_overall_latency_seconds you would have both functionalities. The tag is optional for use in grafana/prometheus, without it you would get the total sum/distribution/count orchestrator agnostic, and you can specify the tag to get it per orchestrator

leszko · 2022-05-09T15:12:17Z

Ok, then my bad in the GH Issue Description. Let's not use the per_orchestrator suffix. So, we'll just reuse the existing metrics:

transcode_overall_latency_seconds
segment_transcoded_all_appeared_total
segment_transcoded_unprocessed_total

red-0ne · 2022-05-09T15:51:27Z

As discussed with @yondonfu , orchestrator's address seems to be more indicative of O, but raised the case of orchestrators running in off-chain mode. We could consider some conditional behavior to log with the appropriate tag. But think we should keep it simple as of the scope of the issue and tagging addresses or URIs only, each one with its trade-offs.

I propose to keep orchURI as the tag used for O specific metrics for now. This would make us lose track of an O's metrics if it changes its URI but I think it's fine for now. WDYT?

leszko · 2022-05-09T16:31:06Z

To me it seems that Service URI could be "good enough". I know that O Eth Addr is "less prone to change", but still I believe Os don't change their Service URI too often. @yondonfu what do you think?

yondonfu · 2022-05-10T00:34:35Z

I think using the service URI for now is okay for now and we can consider using the ETH address as the tag separately.

yondonfu · 2022-05-11T20:12:01Z

Based on #2386 I think tagging these metrics with the O's ETH address as well would be helpful because:

The ETH address is a stable identifier for the O
While we can try to match the on-chain service URI (fetched from the subgraph) against the URI included in this tag, in certain cases these two values may not match because while the on-chain service URI will always be used for discovery requests from B, the O can redirect the B to a different URI for segment requests. See this comment for an example

If we only tag the metrics with the URI, then we'll be able to group the metrics by URI, but due to the above points we may not have an exact mapping of metrics to stable O identifiers. If we tag the metrics with the ETH address then I think we should be able to group the metrics by ETH address which would provide an exact mapping of metrics to stable O identifiers.

@red-0ne @leszko Would it be possible to also tag metrics for this issue with the ETH address?

leszko · 2022-05-12T08:14:28Z

Yeah, it makes sense.

I just wonder, if we tag with ETH address, then what about off-chain Os? I think we'll also need metrics for off-chain Os. Then, probably the best option is to have two tags attached: serviceURI AND ethAddress.

Wdyt? @yondonfu @red-0ne

red-0ne · 2022-05-12T12:50:47Z

I opt for tagging them both too. We could always filter with nullish values to get chainless Os

github-actions bot added the status: triage this issue has not been evaluated yet label Apr 25, 2022

leszko assigned figintern Apr 25, 2022

leszko added area: metrics and removed status: triage this issue has not been evaluated yet labels Apr 25, 2022

leszko assigned red-0ne and unassigned figintern Apr 26, 2022

red-0ne linked a pull request Apr 29, 2022 that will close this issue

Add per orchestrator metrics recorded by Bs #2382

Merged

5 tasks

yondonfu added the status: core contributors working on it in progress label May 3, 2022

figintern assigned figintern and unassigned figintern May 5, 2022

yondonfu mentioned this issue May 10, 2022

Add eth address to "Finished push request" log line #2386

Closed

oscar-davids mentioned this issue May 12, 2022

get matrics for fast verification per orchestrator #2397

Merged

5 tasks

red-0ne closed this as completed in #2382 May 20, 2022

This was referenced May 24, 2022

Verify that Per Orchestrator metrics work correctly livepeer/livepeer-monitoring#12

Closed

Fix Prometheus queries after adding Per Orchestrator metrics livepeer/stream-tester#151

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per Orchestrator metrics in Broadcaster #2371

Per Orchestrator metrics in Broadcaster #2371

leszko commented Apr 25, 2022

yondonfu commented Apr 26, 2022 •

edited

Loading

leszko commented Apr 27, 2022

red-0ne commented Apr 29, 2022

red-0ne commented May 3, 2022

figintern commented May 5, 2022 •

edited

Loading

figintern commented May 5, 2022 •

edited

Loading

leszko commented May 9, 2022

red-0ne commented May 9, 2022 •

edited

Loading

leszko commented May 9, 2022

yondonfu commented May 10, 2022

yondonfu commented May 11, 2022

leszko commented May 12, 2022

red-0ne commented May 12, 2022

Per Orchestrator metrics in Broadcaster #2371

Per Orchestrator metrics in Broadcaster #2371

Comments

leszko commented Apr 25, 2022

yondonfu commented Apr 26, 2022 • edited Loading

leszko commented Apr 27, 2022

red-0ne commented Apr 29, 2022

red-0ne commented May 3, 2022

figintern commented May 5, 2022 • edited Loading

figintern commented May 5, 2022 • edited Loading

leszko commented May 9, 2022

red-0ne commented May 9, 2022 • edited Loading

leszko commented May 9, 2022

yondonfu commented May 10, 2022

yondonfu commented May 11, 2022

leszko commented May 12, 2022

red-0ne commented May 12, 2022

yondonfu commented Apr 26, 2022 •

edited

Loading

figintern commented May 5, 2022 •

edited

Loading

figintern commented May 5, 2022 •

edited

Loading

red-0ne commented May 9, 2022 •

edited

Loading