-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Share some metrics between the Prometheus exporter and the phone home stats #13671
Changes from 5 commits
3fa9191
dc250e6
9a11e72
7a9e4d0
85d0dde
5e730d9
767b507
1ea920b
d748e35
acf46b0
ddd984b
07eb4a7
e81b85f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Introduce a `SharedUsageMetrics` class to share some usage metrics between the Prometheus exporter and the phone home stats. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# Copyright 2022 The Matrix.org Foundation C.I.C | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
from typing import TYPE_CHECKING | ||
|
||
from synapse.metrics.background_process_metrics import run_as_background_process | ||
|
||
if TYPE_CHECKING: | ||
from synapse.server import HomeServer | ||
|
||
from prometheus_client import Gauge | ||
|
||
# Gauge to expose daily active users metrics | ||
current_dau_gauge = Gauge( | ||
"synapse_admin_daily_active_users", | ||
"Current daily active users count", | ||
) | ||
|
||
|
||
class SharedUsageMetrics: | ||
"""Usage metrics shared between the phone home stats and the prometheus exporter.""" | ||
|
||
def __init__(self, hs: "HomeServer") -> None: | ||
self._store = hs.get_datastores().main | ||
self._clock = hs.get_clock() | ||
|
||
self.daily_active_users = -1 | ||
|
||
async def setup(self) -> None: | ||
"""Reads the current values for the shared usage metrics and starts a looping | ||
call to keep them updated. | ||
""" | ||
await self.update() | ||
self._clock.looping_call( | ||
run_as_background_process, | ||
5 * 60 * 1000, | ||
desc="update_shared_usage_metrics", | ||
func=self.update, | ||
) | ||
|
||
async def update(self) -> None: | ||
"""Updates the shared usage metrics.""" | ||
await self.update_daily_active_users() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This feels very boilerplatey currently but the idea is that we can add to this method if we add new metrics to this class. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would be tempted to separate the stages of:
One function per metric does feel a bit too boilerplatey to me. as an idea async def _collect(self) -> SharedUsageMetrics:
return SharedUsageMetrics(
daily_active_users=await self._store.count_daily_users()
)
async def _update_gauges(self, metrics: SharedUsageMetrics) -> None:
current_dau_gauge.set(metrics.daily_active_users) Just my opinion, but I like that a bit better There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So we would generate a new object each time we collect metrics? That sounds fairly inefficient to me. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've implemented this idea in a slightly different way from your proposal, lmk what you think. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think it's not a big cost for the clarity it gives. Remember this is Python where basically everything, even integers, are objects, so one more object isn't going to be much harm (especially if it uses slots). I would prefer the way I proposed as each metric is more self-contained in the code; the current proposal has the metric being updated differently in two branches and stored in an intermediate variable; it feels clunkier. Even supposing objects were more inefficient, this is infrequently run that I don't think it's worth premature optimisation, better to simplify the code to reduce footgun potential I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I disagree. Even though, as you mention, the gain in optimising this is pretty negligible (for now at least, there's always the possibility that something else starts using it, such as modules), to me creating a new object everytime something requests it is a bad code smell. Plus I think the code would be more confusing that way, because then we'd be updating the prometheus gauge every 5 minutes by reading the count from the database and then when the phone home stats ask for them we'd do the whole dance again. It feels both more logical and more efficient to me to have the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you substantiate why you think it's a code smell? Creating an object or struct to hold associated values is such an accepted pattern elsewhere that I'm surprised it's objectionable. (In fact, mutating the existing objects that you've already handed out is usually considered an awful crime, to the point that some disciplines insist on immutability in many places.) I think the worse code smell here is not keeping the logic for one metric in one place, e.g. if you have metrics A B and C then the code locality looks like: 1st proposal: self.stats = Stats(
a=count_a(),
b=count_b(),
c=count_c(),
) 2nd proposal: a = count_a()
b = count_b()
c = count_c()
if ..:
self.stats = Stats(a=a, b=b, c=c)
else:
self.stats.a = a
self.stats.b = b
self.stats.c = c If we annotate the two proposals' lines with which metric they concern, you can see my point here that the locality of behaviour is improved in the first suggestion (which I believe is better for its readability): 1st proposal: self.stats = Stats(
a=count_a(), # A
b=count_b(), # B
c=count_c(), # C
) 2nd proposal: a = count_a() # A
b = count_b() # B
c = count_c() # C
if ..:
self.stats = Stats(a=a, b=b, c=c) # A, B, C
else:
self.stats.a = a # A
self.stats.b = b # B
self.stats.c = c # C I also think the branch here is just more clutter than it's worth. Pessimistically, this is now twice as much to go wrong and twice as much to test.
I will note that I don't think the gain is pretty negligible — I think it's highly dubious; we create objects all the time in Synapse and in Python generally, whether they're integers, tuples, or instances of classes. Of course, if we really want to talk about such tiny optimisations, then we might instead consider whether introducing an
I think you misunderstand my statement here (hopefully having spelled out the proposals above clarifies it?). You don't have to recalculate each and every time we request it, just the style of handing back the record can be different; just always create a new record when updating the metrics rather than mutating the previous one. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ah, I was misunderstanding it indeed. You initial proposal of |
||
|
||
async def update_daily_active_users(self) -> None: | ||
"""Updates the daily active users count.""" | ||
dau_count = await self._store.count_daily_users() | ||
current_dau_gauge.set(float(dau_count)) | ||
self.daily_active_users = dau_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of having each field have a sentinel value; it gives the impression that we may end up reporting
-1
(although I see that you update upon initialisation). IMO we shouldn't report anything rather than report negative values. I can see this being potentially troublesome for the phone-home stats when we sum DAUs.I think I would be tempted to restructure this a bit:
SharedUsageMetricsManager
or equivalent (not sure what the conventional term is in Synapse ... is itController
?)SharedUsageMetrics
a plain dataclass/attrs object, withdaily_active_users
as an int fieldlatest(): Optional[SharedUsageMetrics]
method (or field maybe?) onSharedUsageMetricsManager
(N.B. I was going to suggest that the metrics be lazily computed, either on-phone-home or on-prometheus-scrape, but I realise we don't want to block the prometheus scrape endpoint on Twisted's reactor in case everything is going awry)
This sort of design prevents having individual metrics having sentinel values like
-1
, which may escape.