Server overload with big request #115

vifonne · 2019-04-25T16:17:21Z

Hello,

When I try to post 100 000 observations or more (through multiples HTTP requests containing dataArray of 50 or 100 observations) the FROST Server (running on AWS with docker) is not available for others request, so the client website (using Grafana with the right plugin for STA) cannot retrieve any data. A 100 000 observation is not very big for the future infrastructure we're gonna have in my work.
I've tried multiple combination of parameters but nothing is really better, you'll find a benchmark table in attachments (the Req GET Time columns mean the time that the client have to wait between the send and the response). When i'm monitoring the server with htop, it says says postgresql take 100% of CPU. (the server has 4 core CPU and 8Gb of RAM).

So is there an option or configuration that i've forgot or is postgresql insert taking a lot of time ?

Best regards !

hylkevds · 2019-04-26T07:23:17Z

Database inserts can be really slow, especially on cloud infrastructures. There are several points to have a look at:

Disk speed. Inserts to the database are almost always limited by the disk write speed. If you need fast inserts, you need fast disks, and the write speed of cloud disks can be really abysmal. I personally don't have experience with AWS, but on Azure you can select how fast you want the disks to be that are added to you VMs (normal disks, SSDs, etc). On Azure it also makes a difference if you reserve a large amount of disk space, or a small amount of disk space. Small virtual disks get grouped on the same physical disk with other users. If you make a large virtual disk, you get physical disks all for yourself. The amount of RAM is almost irrelevant for insert speeds.
Foreign keys. To make an insert in a table with foreign keys, PG has to lock the specific foreign key values that it uses, to make sure they are not changed while it is doing the insert. The Observation table has two foreign keys (Datastream_id and FeatureOfInterest_id) so it can only do one insert at a time for each value of those. This means you can't get more speed with parallel inserts on the same DS & FoI, but you can get more speed by inserting on different DSs and FoIs. Depending on hardware, you can easily go up to 80 or more parallel inserts when doing this.
Triggers. FROST updates the Datastream.phenomenonTime, Datastream.resultTime and Datastream.observedArea with a trigger. If you need faster inserts, especially when importing data, you can gain a lot of speed by disabling this trigger. If you need those Datastream properties, you could update them every so often with a cron job.
To see what the DB is actually doing, you can do a query like SELECT * FROM pg_stat_activity a where state != 'idle';. That will also show you what a query is waiting on.

vifonne · 2019-04-27T10:34:07Z

Thanks for your help but the problems comes from the NodeJS sending requests, i just added 20ms delay between request and everything is good.

vifonne closed this as completed Apr 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server overload with big request #115

Server overload with big request #115

vifonne commented Apr 25, 2019

hylkevds commented Apr 26, 2019

vifonne commented Apr 27, 2019

Server overload with big request #115

Server overload with big request #115

Comments

vifonne commented Apr 25, 2019

hylkevds commented Apr 26, 2019

vifonne commented Apr 27, 2019