Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server overload with big request #115

Closed
vifonne opened this issue Apr 25, 2019 · 2 comments
Closed

Server overload with big request #115

vifonne opened this issue Apr 25, 2019 · 2 comments

Comments

@vifonne
Copy link

vifonne commented Apr 25, 2019

Hello,

When I try to post 100 000 observations or more (through multiples HTTP requests containing dataArray of 50 or 100 observations) the FROST Server (running on AWS with docker) is not available for others request, so the client website (using Grafana with the right plugin for STA) cannot retrieve any data. A 100 000 observation is not very big for the future infrastructure we're gonna have in my work.
I've tried multiple combination of parameters but nothing is really better, you'll find a benchmark table in attachments (the Req GET Time columns mean the time that the client have to wait between the send and the response). When i'm monitoring the server with htop, it says says postgresql take 100% of CPU. (the server has 4 core CPU and 8Gb of RAM).
Capture

So is there an option or configuration that i've forgot or is postgresql insert taking a lot of time ?

Best regards !

@hylkevds
Copy link
Member

Database inserts can be really slow, especially on cloud infrastructures. There are several points to have a look at:

  1. Disk speed. Inserts to the database are almost always limited by the disk write speed. If you need fast inserts, you need fast disks, and the write speed of cloud disks can be really abysmal. I personally don't have experience with AWS, but on Azure you can select how fast you want the disks to be that are added to you VMs (normal disks, SSDs, etc). On Azure it also makes a difference if you reserve a large amount of disk space, or a small amount of disk space. Small virtual disks get grouped on the same physical disk with other users. If you make a large virtual disk, you get physical disks all for yourself. The amount of RAM is almost irrelevant for insert speeds.
  2. Foreign keys. To make an insert in a table with foreign keys, PG has to lock the specific foreign key values that it uses, to make sure they are not changed while it is doing the insert. The Observation table has two foreign keys (Datastream_id and FeatureOfInterest_id) so it can only do one insert at a time for each value of those. This means you can't get more speed with parallel inserts on the same DS & FoI, but you can get more speed by inserting on different DSs and FoIs. Depending on hardware, you can easily go up to 80 or more parallel inserts when doing this.
  3. Triggers. FROST updates the Datastream.phenomenonTime, Datastream.resultTime and Datastream.observedArea with a trigger. If you need faster inserts, especially when importing data, you can gain a lot of speed by disabling this trigger. If you need those Datastream properties, you could update them every so often with a cron job.
  4. To see what the DB is actually doing, you can do a query like SELECT * FROM pg_stat_activity a where state != 'idle';. That will also show you what a query is waiting on.

@vifonne
Copy link
Author

vifonne commented Apr 27, 2019

Thanks for your help but the problems comes from the NodeJS sending requests, i just added 20ms delay between request and everything is good.

@vifonne vifonne closed this as completed Apr 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants