Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

User/Room statistics #4337

Closed
ara4n opened this issue Dec 29, 2018 · 2 comments
Closed

User/Room statistics #4337

ara4n opened this issue Dec 29, 2018 · 2 comments
Assignees

Comments

@ara4n
Copy link
Member

ara4n commented Dec 29, 2018

Back in the summer I started a branch called https://github.com/matrix-org/synapse/tree/matthew/stats, intended to make life easier for server admins who want to have visibility on which users and rooms are consuming resources on their server. The original impetus from this came from disroot.org who were complaining at the time about their server using expensive amounts of diskspace with no way to restrict or even visualise resources per-user. It'd also be useful for Modular and anyone else wanting better control over how their resources are used however.

The branch was written rapidly in one session to try to help Disroot before they gave up - I believe all the code is largely sane (being mainly factored out by the existing UserDirectory logic), but I can't remember if it has ever actually been run or debugged. Meanwhile, Disroot gave up despite my efforts, at which point the branch fell down the todo list, and I got distracted onto other things and have not had bandwidth since to finish it.

The reason for bringing it back up now is that it came up recently in discussion with @hawkowl and @richvdh in terms of whether it could provide a way of addressing the RoomDirectory's performance issues - given this branch maintains a current snapshot of selected room state in the db, which could be used to trivially pull out the data required to populate/search the room directory without ever having to do state resolution etc.

In order to investigate further, @hawkowl requested a spec of what the branch was trying to do, which is what this issue attempts to be. So, the goals were:

Stuff which got done:

  • Track per-user and per-room resource usage (current and historical) for server admins, in order to track abuse and better manage how the server's resources (particularly disk space) are used.
  • Track a running total of common values of current room state across all rooms, as a useful resource to have available to Synapse (e.g. for speeding up RoomDirectory) given we get it for free whilst calculating per-room stats.

Stuff which hasn't yet been done:

  • Add a simple admin API to pull the current/historical stats out of the DB for users & rooms as JSON and so show who's the biggest resource hogs.
  • Use the resulting stats data to allow servers to configure resource quotas per-user or per-room.
  • Add a basic web admin interface to visualise the resource hogs and hook up a way to remove unwanted content from the DB.
  • Track media repository usage per-user and on aggregate (looking purely at the number of uploads to the repo, given we can't correlate uploads to events given the events may be E2E encrypted)

The easiest way to visualise this is at the DB level, where the schema (from memory) is:

CREATE TABLE user_stats (
    user_id TEXT NOT NULL,
    ts BIGINT NOT NULL,  -- stats cover the timeslice from ts to ts+bucket_size (in ms)
    bucket_size INT NOT NULL, 
    sent_events INT NOT NULL, -- number of events sent by this user in this timeslice (not yet hooked up)
    local_events INT NOT NULL, -- total number of local events attributable to this user at time `ts` (i.e. how many locally stored events they can see in the rooms they're in)
    public_rooms INT NOT NULL, -- how many public rooms they were in at time `ts`
    private_rooms INT NOT NULL,  -- how many public rooms they were in at time `ts`
    sent_file_count INT NOT NULL, -- how many files they've uploaded to the media repo (not yet hooked up)
    sent_file_size INT NOT NULL, -- how many bytes of files they've uploaded to the media repo (not yet hooked up)
);

CREATE TABLE room_stats (
    room_id TEXT NOT NULL,
    ts BIGINT NOT NULL,
    bucket_size INT NOT NULL,
    current_state_events INT NOT NULL, -- number of currently applicable state events for this room at time `ts` (does not include overwritten state events)
    joined_members INT NOT NULL, -- total number of joined members in this room at time `ts`
    invited_members INT NOT NULL, -- total number of invited members in this room at time `ts`
    left_members INT NOT NULL, -- total number of parted members in this room at time `ts`
    banned_members INT NOT NULL,  -- total number of banned members in this room at time `ts`
    state_events INT NOT NULL, -- total number of state events stored for this room at time `ts` (includes overwritten state events)
    local_events INT NOT NULL, -- total number of local events stored for this room at time `ts`
    remote_events INT NOT NULL, -- total number of remote events stored for this room at time `ts`
    sent_events INT NOT NULL, -- number sent by this server per timeslice (not yet hooked up)
);

-- cache of current room state; useful for the publicRooms list
CREATE TABLE room_state (
    room_id TEXT NOT NULL,
    join_rules TEXT NOT NULL,
    history_visibility TEXT NOT NULL,
    encrypted BOOLEAN,
    name TEXT NOT NULL,
    topic TEXT NOT NULL,
    avatar TEXT NOT NULL,
    canonical_alias TEXT NOT NULL,
    -- get aliases straight from the right table
);

-- not hooked up yet, this is meant to be aggregate stats about the media repo.
CREATE TABLE media_stats (
    ts BIGINT NOT NULL,
    bucket_size INT NOT NULL,
    local_media_count INT NOT NULL,
    local_media_size INT NOT NULL,
    remote_media_count INT NOT NULL,
    remote_media_size INT NOT NULL,
);

Hopefully this gives an explanation of what the branch is trying to do, such that someone may be able to salvage something from it without reinventing the wheel.

@ara4n
Copy link
Member Author

ara4n commented Dec 30, 2018

one thought for an extension would be to track unique senders per timeslice per room, rather than just sent_events.

@ara4n
Copy link
Member Author

ara4n commented Jan 30, 2019

said branch now lives on at #4338

@hawkowl hawkowl self-assigned this Jan 31, 2019
@hawkowl hawkowl added the v1.0 label Jan 31, 2019
@neilisfragile neilisfragile reopened this Mar 14, 2019
@richvdh richvdh changed the title A spec for the matthew/stats branch User/Room statistics Apr 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants