You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
Back in the summer I started a branch called https://github.com/matrix-org/synapse/tree/matthew/stats, intended to make life easier for server admins who want to have visibility on which users and rooms are consuming resources on their server. The original impetus from this came from disroot.org who were complaining at the time about their server using expensive amounts of diskspace with no way to restrict or even visualise resources per-user. It'd also be useful for Modular and anyone else wanting better control over how their resources are used however.
The branch was written rapidly in one session to try to help Disroot before they gave up - I believe all the code is largely sane (being mainly factored out by the existing UserDirectory logic), but I can't remember if it has ever actually been run or debugged. Meanwhile, Disroot gave up despite my efforts, at which point the branch fell down the todo list, and I got distracted onto other things and have not had bandwidth since to finish it.
The reason for bringing it back up now is that it came up recently in discussion with @hawkowl and @richvdh in terms of whether it could provide a way of addressing the RoomDirectory's performance issues - given this branch maintains a current snapshot of selected room state in the db, which could be used to trivially pull out the data required to populate/search the room directory without ever having to do state resolution etc.
In order to investigate further, @hawkowl requested a spec of what the branch was trying to do, which is what this issue attempts to be. So, the goals were:
Stuff which got done:
Track per-user and per-room resource usage (current and historical) for server admins, in order to track abuse and better manage how the server's resources (particularly disk space) are used.
Track a running total of common values of current room state across all rooms, as a useful resource to have available to Synapse (e.g. for speeding up RoomDirectory) given we get it for free whilst calculating per-room stats.
Stuff which hasn't yet been done:
Add a simple admin API to pull the current/historical stats out of the DB for users & rooms as JSON and so show who's the biggest resource hogs.
Use the resulting stats data to allow servers to configure resource quotas per-user or per-room.
Add a basic web admin interface to visualise the resource hogs and hook up a way to remove unwanted content from the DB.
Track media repository usage per-user and on aggregate (looking purely at the number of uploads to the repo, given we can't correlate uploads to events given the events may be E2E encrypted)
The easiest way to visualise this is at the DB level, where the schema (from memory) is:
CREATETABLEuser_stats (
user_id TEXTNOT NULL,
ts BIGINTNOT NULL, -- stats cover the timeslice from ts to ts+bucket_size (in ms)
bucket_size INTNOT NULL,
sent_events INTNOT NULL, -- number of events sent by this user in this timeslice (not yet hooked up)
local_events INTNOT NULL, -- total number of local events attributable to this user at time `ts` (i.e. how many locally stored events they can see in the rooms they're in)
public_rooms INTNOT NULL, -- how many public rooms they were in at time `ts`
private_rooms INTNOT NULL, -- how many public rooms they were in at time `ts`
sent_file_count INTNOT NULL, -- how many files they've uploaded to the media repo (not yet hooked up)
sent_file_size INTNOT NULL, -- how many bytes of files they've uploaded to the media repo (not yet hooked up)
);
CREATETABLEroom_stats (
room_id TEXTNOT NULL,
ts BIGINTNOT NULL,
bucket_size INTNOT NULL,
current_state_events INTNOT NULL, -- number of currently applicable state events for this room at time `ts` (does not include overwritten state events)
joined_members INTNOT NULL, -- total number of joined members in this room at time `ts`
invited_members INTNOT NULL, -- total number of invited members in this room at time `ts`
left_members INTNOT NULL, -- total number of parted members in this room at time `ts`
banned_members INTNOT NULL, -- total number of banned members in this room at time `ts`
state_events INTNOT NULL, -- total number of state events stored for this room at time `ts` (includes overwritten state events)
local_events INTNOT NULL, -- total number of local events stored for this room at time `ts`
remote_events INTNOT NULL, -- total number of remote events stored for this room at time `ts`
sent_events INTNOT NULL, -- number sent by this server per timeslice (not yet hooked up)
);
-- cache of current room state; useful for the publicRooms listCREATETABLEroom_state (
room_id TEXTNOT NULL,
join_rules TEXTNOT NULL,
history_visibility TEXTNOT NULL,
encrypted BOOLEAN,
name TEXTNOT NULL,
topic TEXTNOT NULL,
avatar TEXTNOT NULL,
canonical_alias TEXTNOT NULL,
-- get aliases straight from the right table
);
-- not hooked up yet, this is meant to be aggregate stats about the media repo.CREATETABLEmedia_stats (
ts BIGINTNOT NULL,
bucket_size INTNOT NULL,
local_media_count INTNOT NULL,
local_media_size INTNOT NULL,
remote_media_count INTNOT NULL,
remote_media_size INTNOT NULL,
);
Hopefully this gives an explanation of what the branch is trying to do, such that someone may be able to salvage something from it without reinventing the wheel.
The text was updated successfully, but these errors were encountered:
Back in the summer I started a branch called https://github.com/matrix-org/synapse/tree/matthew/stats, intended to make life easier for server admins who want to have visibility on which users and rooms are consuming resources on their server. The original impetus from this came from disroot.org who were complaining at the time about their server using expensive amounts of diskspace with no way to restrict or even visualise resources per-user. It'd also be useful for Modular and anyone else wanting better control over how their resources are used however.
The branch was written rapidly in one session to try to help Disroot before they gave up - I believe all the code is largely sane (being mainly factored out by the existing UserDirectory logic), but I can't remember if it has ever actually been run or debugged. Meanwhile, Disroot gave up despite my efforts, at which point the branch fell down the todo list, and I got distracted onto other things and have not had bandwidth since to finish it.
The reason for bringing it back up now is that it came up recently in discussion with @hawkowl and @richvdh in terms of whether it could provide a way of addressing the RoomDirectory's performance issues - given this branch maintains a current snapshot of selected room state in the db, which could be used to trivially pull out the data required to populate/search the room directory without ever having to do state resolution etc.
In order to investigate further, @hawkowl requested a spec of what the branch was trying to do, which is what this issue attempts to be. So, the goals were:
Stuff which got done:
Stuff which hasn't yet been done:
The easiest way to visualise this is at the DB level, where the schema (from memory) is:
Hopefully this gives an explanation of what the branch is trying to do, such that someone may be able to salvage something from it without reinventing the wheel.
The text was updated successfully, but these errors were encountered: