Database performance is abysmal #405

Nutomic · 2020-01-10T22:54:06Z

I ran a quick benchmark against the feeds endpoint, and Lemmy only manages to handle 1.19 requests/second (peertube.social does 34 r/s, and is written in typescript). I dont know if this is a problem with feeds in particular or with Lemmy in general, as other APIs are only available as websocket and much more complicated to test.

I ran the following using ab.

ab -c 10 -t 10 "https://communism.lemmy.ml/feeds/all.xml?sort=Hot"
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking communism.lemmy.ml (be patient)
Finished 12 requests


Server Software:        nginx
Server Hostname:        communism.lemmy.ml
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
Server Temp Key:        X25519 253 bits
TLS Server Name:        communism.lemmy.ml

Document Path:          /feeds/all.xml?sort=Hot
Document Length:        10427 bytes

Concurrency Level:      10
Time taken for tests:   10.084 seconds
Complete requests:      12
Failed requests:        0
Total transferred:      128868 bytes
HTML transferred:       125124 bytes
Requests per second:    1.19 [#/sec] (mean)
Time per request:       8403.400 [ms] (mean)
Time per request:       840.340 [ms] (mean, across all concurrent requests)
Transfer rate:          12.48 [Kbytes/sec] received

Connection Times (ms)
min  mean[+/-sd] median   max
Connect:      179  209  22.3    201     251
Processing:   854 4385 2596.8   4812    7926
Waiting:      854 4384 2597.0   4812    7926
Total:       1033 4594 2606.0   5008    8160

Percentage of the requests served within a certain time (ms)
50%   5008
66%   5799
75%   7382
80%   7382
90%   7912
95%   8160
98%   8160
99%   8160
100%   8160 (longest request)

I took this screenshot on the server at the same time. It is obvious that the database is doing too much work. Maybe we need to add more indexes in the database, or reduce the amount of queries.

Edit: Worth noting that I cant reproduce this problem locally, so it is probably related to the database size.

Edit 2: This also means that any Lemmy instance can be DDoSed by doing only one request per second.

The text was updated successfully, but these errors were encountered:

Nutomic · 2020-01-10T23:23:37Z

Here is an analysis of the postgres logs using pgBadger. One thing I noticed is that every query selects a lot of columns, and uses a lot of stuff like selection and sort parameters.

http://nextcloud.nutomic.com/index.php/s/m3j8yKcG6zTR9mC

Edit: Query durations seem to be missing in the log analysis for some reason. The most frequent query (actually the second, after select 0;), has an execution time of 142 ms alone. Others are probably similar, and presumably a single request needs much more than one query.

dessalines · 2020-01-11T00:45:50Z

Okay I've got this big db imported locally, and am getting some similarly bad results.

I'm gonna make a script that generates a bunch of explain files for the most common queries and their output, and puts them in a folder, so they can be analyzed by http://tatiyants.com/pev/

This is the general command it needs:

echo "explain (analyze, format json) select * from user_view" > explain.sql
psql -qAt -U lemmy -f explain.sql > analyze.json

Ideally doing diffs of the json before and after creating the indexes

dessalines · 2020-01-11T01:01:17Z

Okay yeah this is pretty substantial. I'm getting like 10x better results just by adding some indexes on creator_ids for a few of the tables.

dessalines · 2020-01-11T03:07:48Z

Okay I've added some indexes with a commit I'll post here in a second, but here's what just adding a few indexes did:

To test this, run

git fetch
git checkout dessalines/db_indexes
cd server/query_testing
./generate_explain_reports.sh
... check the json files in that folder
cd ..
diesel migration run
cd query_testing
./generate_explain_reports.sh
... check the json files in that folder
# if you want to revert, run diesel migration revert

query	before (ms)	after (ms)
select * from user_view	1017	182
select * from post_view where user_id is null order by hot_rank desc	651	317
select * from comment_view where user_id is null	114	114
select * from community_view where user_id is null order by hot_rank desc	94	51

The main thing that needs to be optimized, is the post_view (which is the front page) and unfortunately its still pretty slow because it has to do lots of joins. Before I commit my indexes, I'm gonna see if I can speed that view up at all.

Someone could take a look at the expensive cross joins I'm doing for the views here: https://github.com/dessalines/lemmy/blob/master/server/migrations/2019-12-29-164820_add_avatar/up.sql#L23

They're the only way I could finally come up with to basically be a single fetch, where you can optionally provide your user_id = X, or user_id is null, and get user specific data like your vote, whether you've saved that post, etc.

dessalines · 2020-01-11T21:30:58Z

So I tried to do a LOT of optimizing today, and could only get the post query down to ~200ms. The issue is basically the hot_sort function needs both a score, (which comes from sum(post_likes) and a published_time, which I can't create an index for since those scores come from a join of two different tables)

The other option options would be to have the post_view be a materialized view, which would mean the whole caching mess and having to periodically refresh that view, or to move the hot_sorting off sql which would be my last resort.

dessalines · 2020-01-12T15:58:06Z

Something weird is going on, maybe with the nginx configs on the servers or something.

I'm testing out adding the connection pooling locally, and finding that locally, both before and after adding the proper r2d2 code, I'm getting the same speed with ab, about 800 reqs / s

When I push to dev.lemmy.ml, and test there, I get the same before and afters, but its only 3 reqs / s

Requests per second:    3.16 [#/sec] (mean)
Time per request:       3169.220 [ms] (mean)
Time per request:       316.922 [ms] (mean, across all concurrent requests)

The only difference is possibly the nginx config.

Nutomic · 2020-01-12T16:04:36Z

Is postgres still using 90% of cpu or more while running the benchmark? If so, the queries are still too complex and we need to cache them or something.

dessalines · 2020-01-12T16:24:06Z

I tried this on my dev server, and postgres only goes up to 10% usage, but its still at only 3 reqs per sec.

I'm running the exact same postgres docker image locally too.

dessalines · 2020-01-14T22:07:38Z

I put most of the comments in #411 . But we might be able to just close this ticket, since the DB performance is now hundreds of times better, and every major query (front page posts, user searching, comments, and communities) is now < 30ms

dessalines · 2020-01-17T18:41:34Z

I'll close now but we can re-open if it becomes an issue. At least we have the performance tests in server/query_testing now.

dessalines · 2022-03-24T15:06:48Z

Re-opening this issue, as the DB is now currently the CPU bottleneck. The best way to figure this out, is to analyze the actually running longer queries, and optimize them individually.

Nutomic · 2022-03-24T20:34:41Z

Better make a new issue, this is already quite long.

Nutomic changed the title ~~Performance is abysmal~~ Database performance is abysmal Jan 10, 2020

Nutomic added the bug Something isn't working label Jan 10, 2020

dessalines added the enhancement New feature or request label Jan 10, 2020

dessalines mentioned this issue Jan 11, 2020

Db indexes #406

Merged

Nutomic added the Component: Backend label Jan 11, 2020

This was referenced Jan 12, 2020

Cache DB queries by using Postgres materialized views #411

Closed

Materialized views #418

Merged

dessalines closed this as completed Jan 17, 2020

dessalines mentioned this issue Jul 5, 2020

Remove materialized views. #908

Merged

dessalines reopened this Mar 24, 2022

dessalines closed this as completed Apr 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database performance is abysmal #405

Database performance is abysmal #405

Nutomic commented Jan 10, 2020 •

edited

Loading

Nutomic commented Jan 10, 2020 •

edited

Loading

dessalines commented Jan 11, 2020

dessalines commented Jan 11, 2020

dessalines commented Jan 11, 2020 •

edited

Loading

dessalines commented Jan 11, 2020

dessalines commented Jan 12, 2020

Nutomic commented Jan 12, 2020

dessalines commented Jan 12, 2020 •

edited

Loading

dessalines commented Jan 14, 2020

dessalines commented Jan 17, 2020

dessalines commented Mar 24, 2022

Nutomic commented Mar 24, 2022

Database performance is abysmal #405

Database performance is abysmal #405

Comments

Nutomic commented Jan 10, 2020 • edited Loading

Nutomic commented Jan 10, 2020 • edited Loading

dessalines commented Jan 11, 2020

dessalines commented Jan 11, 2020

dessalines commented Jan 11, 2020 • edited Loading

dessalines commented Jan 11, 2020

dessalines commented Jan 12, 2020

Nutomic commented Jan 12, 2020

dessalines commented Jan 12, 2020 • edited Loading

dessalines commented Jan 14, 2020

dessalines commented Jan 17, 2020

dessalines commented Mar 24, 2022

Nutomic commented Mar 24, 2022

Nutomic commented Jan 10, 2020 •

edited

Loading

Nutomic commented Jan 10, 2020 •

edited

Loading

dessalines commented Jan 11, 2020 •

edited

Loading

dessalines commented Jan 12, 2020 •

edited

Loading