Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

db on-disk space usage increased v1.58+ #12975

Closed
herb opened this issue Jun 7, 2022 · 3 comments
Closed

db on-disk space usage increased v1.58+ #12975

herb opened this issue Jun 7, 2022 · 3 comments

Comments

@herb
Copy link

herb commented Jun 7, 2022

Description

Since upgrading past v1.58 size of database on disk has ballooned for my usage. Specifically the device_lists_changes_in_room table.

Steps to reproduce

I suspect the problem is my specific usage of synapse: I use an automated script that creates rooms and manages adding/removing users from those rooms as needed. The scripts runs async and in parallel across hosts+processes.

The script logs in as a single administrative user (marked as 'admin' and in every room it creates) with a random device ID (to avoid collisions) and logs out after every use. This user is a member of each and every room on the system. In each environment I deploy there are 100k+ rooms.

As I understand #12321, a record of a user's rooms is kept after each 'change'. if a change were to affect my administrative user its membership in every room is recorded. I assume a 'change' is a login/logout?

Tactical question: since I'm not federating, can I safely purge rows device_lists_changes_in_room or is there some short-term, destructive approach I can use to avoid running out of disk space?

Higher level question: is the bug my usage? Is there a better way to accomplish what I want with synapse?

Version information

  • Homeserver: private, non-federated homeserver
  • Platform: debian 10/11, aws ec2 r5a.2xlarge
@reivilibre
Copy link
Contributor

I think you've basically found the right conclusions.

device_lists_changes_in_room is a new table, tracking device list changes into each room.
I don't notice any clean-up code for this table in the PR in which it was introduced: #12321, so I'm taking it that it doesn't have any.

I'm not sure whether this table could be periodically cleaned up; I've asked — I'm not sure if that would mean that all users in the room need to sync past that point to receive the update, or whether it's just serving as a faster cache for data that could be found out otherwise. I've asked.

As for whether you are 'holding it wrong' so to speak: I don't think you're doing anything too offensive, but it's not working out well compared to the optimisations put in place for more 'typical' usage patterns.
I think the real cost is that you log in and out each time the script runs. If you can make the script re-use the same device each time, I think that would mitigate your problems.

From /login:

ID of the client device. If this does not correspond to a known client device, a new device will be created.

I think based on what that's saying that you can configure your script with a device ID and use the same one each time.
This also seems to be implied by https://spec.matrix.org/v1.2/client-server-api/#relationship-between-access-tokens-and-devices.
A quick glance at the code is giving me conflicting ideas on whether this will work, though (in short: I wouldn't be surprised if Synapse may be out of spec..) as it looks like it might complain if you re-use a device ID.

If you log in with the same device ID, it sounds like it should re-use that device (and therefore it shouldn't produce a change).

with a random device ID (to avoid collisions)

(BTW if you don't specify a device ID, the server will generate one for you, so you don't need to worry about that part yourself.)

I assume a 'change' is a login/logout?

Login/outs count as changes, yes. I think there may be some other reasons they can change, usually to do with encryption.

Is there a better way to accomplish what I want with synapse?

Maybe. The admin API can let you get an access token for a user without creating an associated device, I think: https://matrix-org.github.io/synapse/latest/admin_api/user_admin_api.html#login-as-a-user
I'd say that counts as a 'workaround' at best maybe, though.

The admin API has some other functionality which may or may not do some of what you want, so if you weren't aware of it, it may be worth a look, but I don't think that makes this issue less valid.

@herb
Copy link
Author

herb commented Jun 8, 2022

Thanks for the quick and thorough reply!

I think the real cost is that you log in and out each time the script runs. If you can make the script re-use the same device each time, I think that would mitigate your problems.

I had tried this a while back by making a pool of device ids for my async workers to share. Pool of device ids because each new login to a device id would invalidate any concurrent session. This had more edges than I anticipated and I scraped it for a simpler implementation.

But I didn't think to re-use the 'session' (device id + access token) everywhere.

The only concern with reusing the session is dealing with a soft_logout=true. I suppose I could always fall back to my existing approach and alert for manual intervention. Will give this a try.

The admin API has some other functionality which may or may not do some of what you want, so if you weren't aware of it, it may be worth a look, but I don't think that makes this issue less valid.

Thanks for pointing out 'login as user'. I had forgotten that it existed. Will give that a try if sharing a session doesn't work. I've had other problems with the admin api but it might be setup related (random 404s).

@erikjohnston
Copy link
Member

I've created #13043 to track clearing out that table periodically. I don't have any great ideas of how else to make this better for you really, so I'm going to close this for now. Shout if you there's anything else.

@erikjohnston erikjohnston closed this as not planned Won't fix, can't repro, duplicate, stale Jun 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants