Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Summary of performance impact of running on resource constrained devices such as SBCs #8428

Closed
emorrp1 opened this issue Sep 30, 2020 · 11 comments
Labels
A-Docs things relating to the documentation A-Performance Performance, both client-facing and admin-facing

Comments

@emorrp1
Copy link

emorrp1 commented Sep 30, 2020

Description

I've been running my homeserver on a cubietruck at home now for some time and am often replying to statements like "you need loads of ram to join large rooms" with "it works fine for me". I thought it might be useful to curate a summary of the issues you're likely to run into to help as a scaling-down guide, maybe highlight these for development work or end up as documentation.

Performance Issues

Presence

This is the main reason people have a poor matrix experience on resource constrained homeservers. Element web will frequently be saying the server is offline while the python process will be pegged at 100% cpu. This feature is used to tell when other users are active (have a client app in the foreground) and therefore more likely to respond, but requires a lot of network activity to maintain even when nobody is talking in a room.

Screenshot_2020-10-01_19-29-46

While synapse does have some performance issues with presence #3971, the fundamental problem is that this is an easy feature to implement for a centralised service at nearly no overhead, but federation makes it combinatorial #8055. There is also a client-side config option which disables the UI and idle tracking enable_presence_by_hs_url to blacklist the largest instances but I didn't notice much difference, so I recommend disabling the feature entirely at the server level as well.

Joining

Joining a "large", federated room will initially fail with the below message in Element web, but waiting a while (10-60mins) and trying again will succeed without any issue. What counts as "large" is not message history, user count, connections to homeservers or even a simple count of the state events, it is instead how long the state resolution algorithm takes. However, each of those numbers are reasonable proxies, so we can use them as estimates since user count is one of the few things you see before joining.

Screenshot_2020-10-02_17-15-06

This is #1211 and will also hopefully be mitigated by peeking matrix-org/matrix-spec-proposals#2753 so at least you don't need to wait for a join to complete before finding out if it's the kind of room you want. Note that you should first disable presence, otherwise it'll just make the situation worse #3120. There is a lot of database interaction too, so make sure you've migrated your data from the default sqlite to postgresql. Personally, I recommend patience - once the initial join is complete there's rarely any issues with actually interacting with the room, but if you like you can just block "large" rooms entirely.

Sessions

Anything that requires modifying the device list #7721 will take a while to propagate, again taking the client "Offline" until it's complete. This includes signing in and out, editing the public name and verifying e2ee. The main mitigation I recommend is to keep long-running sessions open e.g. by using Firefox SSB "Use this site in App mode" or Chromium PWA "Install Element".

Recommended configuration

Put the below in a new file at /etc/matrix-synapse/conf.d/sbc.yaml to override the defaults in homeserver.yaml.

# Set to false to disable presence tracking on this homeserver.
use_presence: false

# When this is enabled, the room "complexity" will be checked before a user
# joins a new remote room. If it is above the complexity limit, the server will
# disallow joining, or will instantly leave.
limit_remote_rooms:
  # Uncomment to enable room complexity checking.
  #enabled: true
  complexity: 3.0

# Database configuration
database:
  name: psycopg2
  args:
    user: matrix-synapse
    # Generate a long, secure one with a password manager
    password: hunter2
    database: matrix-synapse
    host: localhost
    cp_min: 5
    cp_max: 10

Currently the complexity is measured by current_state_events / 500. You can find join times and your most complex rooms like this:

admin@freedombox:~$ zgrep '/client/r0/join/' /var/log/matrix-synapse/homeserver.log* | awk '{print $18, $25}' | sort --human-numeric-sort
182.088sec/0.003sec /_matrix/client/r0/join/%23decentralizedweb-general%3Amatrix.org

admin@freedombox:~$ sudo --user postgres psql matrix-synapse --command 'select canonical_alias, joined_members, current_state_events from room_stats_state natural join room_stats_current where canonical_alias is not null order by current_state_events desc fetch first 5 rows only'
        canonical_alias        | joined_members | current_state_events 
-------------------------------+----------------+----------------------
 #_oftc_#debian:matrix.org             |  871   |  52355
 #matrix:matrix.org                    |  6379  |  10684
 #irc:matrix.org                       |  461   |  3751
 #decentralizedweb-general:matrix.org  |  997   |  1509
 #whatsapp:maunium.net                 |  554   |  854

Version information

  • Homeserver: freedombox.emorrp1.name
  • Version: 1.19.1
  • Install method: debian buster-backports via freedombox with postgresql and ldap
  • Platform: 2x1GHz armhf 2GiB ram Single Board Computers, SSD. It seems that once you get up to about 4x1.5GHz arm64 4GiB these issues are no longer a problem.
@ptman
Copy link
Contributor

ptman commented Oct 1, 2020

Note the limit_remote_rooms config section which prevent small servers from joining resource hungry rooms.

@anoadragon453 anoadragon453 added the A-Docs things relating to the documentation label Oct 1, 2020
@anoadragon453
Copy link
Member

Once this is complete it may be nice to place a formatted into the repo's wiki.

@emorrp1
Copy link
Author

emorrp1 commented Oct 1, 2020

I've now added the section about presence, since that's the high-profile one as far as I'm concerned.

@ptman thanks, I'm not familiar with that one, could you expand on its impact for me, or give an example room that would be blocked? I've joined several of the common so-called "large" rooms with 5000+ users and a long history without issue (after the initial join and disabling presence).

@anoadragon453 whatever you think is best. Could you also tag this with the "performance" label so it shows up in the issue searches?

@ptman
Copy link
Contributor

ptman commented Oct 1, 2020

@emorrp1 it's not about user count, more about participating homeserver count. Check out

async def get_room_complexity(self, room_id):

@emorrp1
Copy link
Author

emorrp1 commented Oct 1, 2020

So if I understand you correctly, you're saying that limit_remote_rooms is a mitigation for #7671? Do you have a suggested complexity value, since 1.0 seems small? Here's the top 5 complex rooms I've joined, the 3rd is one I participate in regularly without issue, though I've just sent you a message in Matrix HQ and can see that has topped out my cpu, but not for long.

matrix-synapse=# select canonical_alias, current_state_events from room_stats_state natural join room_stats_current where canonical_alias is not null order by current_state_events desc fetch first 5 rows only;
        canonical_alias         | current_state_events 
--------------------------------+----------------------
 #_oftc_#debian:matrix.org      |                52338
 #matrix:matrix.org             |                10651
 #freedombox:matrix.org         |                10189
 #gamingonlinux:matrix.org      |                 9800
 #freenode_#lobsters:matrix.org |                 8799

@ptman
Copy link
Contributor

ptman commented Oct 2, 2020

It was created by New Vector to limit problems on their smallest hosted instances: #5783

@anoadragon453 anoadragon453 added the A-Performance Performance, both client-facing and admin-facing label Oct 2, 2020
@auscompgeek
Copy link
Contributor

There is also a client-side config option enable_presence_by_hs_url to blacklist the largest instances but I didn't notice much difference, so I recommend disabling the feature entirely at the server level.

Note that this only disables the UI and idle tracking that element-web does. It mostly only makes sense to add to that config if the homeserver also has presence disabled.

@emorrp1
Copy link
Author

emorrp1 commented Oct 24, 2020

Hi all, I think I've finished all the issues I've seen, please tell me if you experience any others or if you have any wording improvements.

@emorrp1
Copy link
Author

emorrp1 commented Nov 8, 2020

It's been a couple of weeks without comment, I have converted the issue to a wiki page as suggested by @anoadragon453. Please make improvements there.

https://github.com/matrix-org/synapse/wiki/Running-synapse-on-Single-board-computers

@emorrp1 emorrp1 closed this as completed Nov 8, 2020
@youphyun
Copy link

youphyun commented Feb 7, 2022

The latest url is: https://matrix-org.github.io/synapse/latest/other/running_synapse_on_single_board_computers.html
I believe "#enabled: true" need to be "enabled: true" in the homeserver.yaml? Or is defining the complexity value enough to enable the room complexity check?

@anoadragon453
Copy link
Member

I believe "#enabled: true" need to be "enabled: true" in the homeserver.yaml?

enabled: true is required to enable complexity checking yes - just defining the complexity is not enough.

Relevant bit of source code:

class LimitRemoteRoomsConfig:
enabled: bool = attr.ib(validator=attr.validators.instance_of(bool), default=False)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Docs things relating to the documentation A-Performance Performance, both client-facing and admin-facing
Projects
None yet
Development

No branches or pull requests

5 participants