Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation of JSON config loads extremely slow (> 10s) #178

Closed
SimonBackx opened this issue Jun 30, 2021 · 20 comments
Closed

Documentation of JSON config loads extremely slow (> 10s) #178

SimonBackx opened this issue Jun 30, 2021 · 20 comments
Labels
bug Something isn't working

Comments

@SimonBackx
Copy link

When you go to the documentation on the website, and go to 'JSON config structure', every page loads very slow. I already have this problem for over a month, I thought it would be a temporary issue, so I never reported it.

Example:
Visit https://caddyserver.com/docs/json/apps/http/ in any browser

I have tried on MacOS 11.4, on Safari 14.1.1 or latest Chrome / Firefox.

If you check the network tab in the developer tools, you can see that the following request is blocked for almost 10 seconds:
https://caddyserver.com/api/docs/config/apps/http/

So this is probably a server / config related issue.

@francislavoie francislavoie added the bug Something isn't working label Jun 30, 2021
@francislavoie
Copy link
Member

This is a known issue.

There was a major rewrite that was done recently to the backend which collects the module documentation, but there were some performance oversights.

@mholt is busy now with other work, but will get back to this at some point.

@mholt
Copy link
Member

mholt commented Jun 30, 2021

On the bright side, the caching I removed fixed a lot of problems.

@douglasg14b
Copy link

I would suggest a loading indicator. Long loads happen, and may take some time to resolve. But the UX side can reduce user frustration if there is some indication that it's loading as opposed to broken or missing.

This message top-center makes it seem like I went to the wrong place :)

image

@mholt
Copy link
Member

mholt commented Aug 19, 2021

Yeah, I agree. I haven't had time to get around to a loading indicator, but the website is open source (obviously), so anyone is welcome to contribute one. I'm just a bit picky, it has to look good and work correctly, and not be too complicated.

@polarathene
Copy link

I also had bad UX when looking at JSON docs a few times in past week or so. Initially as someone who hadn't seen the docs I had assumed that I was viewing undocumented/empty page, and the image above also made me wonder if I was looking in the wrong place.

It was only by chance when I left one of the doc pages open when I went to get a coffee that I came back and realized it was just extremely slow at loading the docs content..

You don't need any fancy loading indicator, just something like "Loading docs.." or similar text message placeholder that is replaced. At least then the user knows that there is actual content not quite there and something may be broken with loading them.


Is the retrieved content not static?

I'm curious what change on the server resulted in this behavior (I assume this problematic behaviour isn't in the open source part of the website?). For me it is a 15 second TTFB (via browser or curl).

There is no client caching headers in the response either? (a short duration surely wouldn't hurt?)

@francislavoie
Copy link
Member

Is the retrieved content not static?

No, it's the result of some complex SQL queries. The database is built up from some currently closed-source tooling that @mholt wrote to collect information on Caddy modules. When the rewrite was done, there were some performance oversights.

Again, it's a time issue at this point, in more ways than one. This will get fixed, when Matt has time to dedicate to that tooling again.

@polarathene
Copy link

No, it's the result of some complex SQL queries.

Those are being run for every request to the same resource? No server-side cache? The caddy module cache-handler or Souin should both be able to support caching the JSON responses as a quick/easy workaround no?

Setting at least Cache-Control header or similar for the JSON responses so that clients can cache the results on page reloads would also be quick and easy addition. Right now it's 15 seconds to visit the same doc page content regardless of when I last visited it, even if that was immediately after the content loaded.

Again, it's a time issue at this point, in more ways than one. This will get fixed, when Matt has time to dedicate to that tooling again.

I understand this, hence the two approaches suggested above. These could sit in the current Caddyfile of this repo perhaps? At the very least the cache-control header if xcaddy for server-side caching module is not desirable.

@francislavoie
Copy link
Member

francislavoie commented Aug 22, 2021

No server-side cache?

Currently no. Because time.

Please have patience.

We're very well aware of the issue and the possible mitigations, but Matt's a very busy guy and he's not had time to spend on this.

@mholt
Copy link
Member

mholt commented Aug 22, 2021

@polarathene

I'm curious what change on the server resulted in this behavior (I assume this problematic behaviour isn't in the open source part of the website?).

  • Implemented strongly versioned docs, making the queries more computational (this also fixed some bugs)
  • Removed caching (also fixed some bugs)

These can be optimized but it's not a huge priority right now. Could bump it up if some significant sponsors or enterprise customers need it, but haven't heard anything from them. I'm mostly busy preparing my presentation for the API Platform Conference in a couple weeks.

@polarathene
Copy link

polarathene commented Aug 22, 2021

Currently no. Because time.

The project Caddyfile could be modified to include one of the mentioned cache modules. I could create a PR for that, but I'm not familiar with your deployment process and if adding xcaddy to include the module is going to be the blocker on your end.

I don't see this as a costly time fix, especially if someone can provide a PR for it. The actual problem with the SQL queries further down can always be addressed at another time if appropriate.

If you're interested in this approach, I'm happy to contribute some time towards it.


  • Removed caching (also fixed some bugs)

reverse_proxy /api/* localhost:4444

Can't we just place header Cache-Control "max-age=3600" or similar here? I've tried to apply that only to JSON responses via handle_response but ran into a bug I think.

I know it's not server caching, so it won't help with first page visits, but at least subsequent visits (for as long as cache duration is set) it will provide a better UX to those revisiting pages.

max-age=300, stale-while-revalidate=604800 will allow for lower max-age if you prefer and let the browser use that previous cache entry for 1 week while refreshing (if longer than 5 minutes since prior request) the cache response in the background for the next visit. That'd be preferable in the meantime.

Would you like a PR for that?

I'm mostly busy preparing my presentation for the API Platform Conference in a couple weeks.

No problem, no rush :)

@xiruizhao
Copy link

A website that loads more than 5s should be considered offline. This should be a priority right now. An outdated documentation is much better than no documentation.

@mohammed90
Copy link
Member

A website that loads more than 5s should be considered offline. This should be a priority right now. An outdated documentation is much better than no documentation.

We're doing our best within our capacity. We're a limited team with only so much time and energy to spare. The focus has been on the main repo in addition to backoffice portal for module devs.

Contributions are welcome if you're willing to take a shot at it!

@polarathene
Copy link

polarathene commented Oct 11, 2021

TL;DR: Not sure how we're meant to take a shot at it, when current proposals have not received actionable responses, and the issue itself AFAIK is closed source?

Contributions are welcome if you're willing to take a shot at it!

I thought the issue was due to an API request to a backend that queries a DB, which we don't have access to? How are we meant to take a shot at it?

I already provided a request/discussion regarding client cache headers so that we can at least only suffer the initial delay. The other suggestion was to add a caddy HTTP cache module via xcaddy. Again no response on that.


I mean, I guess we could implement a crawler for all the pages to collect the JSON responses and run that on a scheduled job to keep refreshing whenever changes have happened, eg run each day. Then static JSON can be served when the backend takes too long to respond.

I don't see that approach being accepted by the maintainers here though. It'd also be presumably easier on their end to just query for each page at deployment time, and again do some recurring refresh job. I honestly don't see the docs being that high frequency of a change that this wouldn't be a valid approach.

Better yet, have the backend store the response (in the DB if preferable), and send that response for the relevant query received, then make the DB query that's currently done (presumably redundantly most of the time), and update the response that will be served the next time it's queried. Again fast responses, with the only drawback being the first person to query that data when it's updated gets stale data.

@mholt
Copy link
Member

mholt commented Oct 11, 2021

I have been working on this, but I'm still stuck trying to export the production DB to my local machine. Recent changes in Cockroach removed the dump command, and the BACKUP SQL statement dumps a whole directory onto the server's file system (instead of where the client is running). Sigh. I've been spending hours on it today already.

So please wait while I migrate to a new DB with proper export capabilities.

@mholt
Copy link
Member

mholt commented Oct 13, 2021

I finally got a copy of the database set up with Postgres locally and took some base timings for loading JSON docs. Today I got the loading time down from 4.9s to 2.9s by adding some indices to a table. Easy win; but most of the day was spent trying to figure out how to set up Postgres and migrate the data over.

There's more queries being made than need to be, so I'll try to tackle those next.

Won't be touching the production DB until the optimizations are complete though.

@mholt
Copy link
Member

mholt commented Oct 13, 2021

Got the average worst case down to < 900ms by eliminating unnecessary queries during recursion (i.e. amortizing). I also rewrote some queries that used subqueries to instead use SELECT DISTINCT ON with an ORDER BY clause.

(As for "worst case": the case gets worse the deeper into the JSON structure you traverse. There is a limit but it's kinda arbitrary so it's hard to say what the actual "worst case" is.) Most page loads happen for me in < 200ms. (Starting load times for same page was 4.9s.)

Of course this will be slower on the production server which doesn't have the specs my workstation does. But the speed improvements should be rather drastic anyway. I'm guessing about 75% faster in the "worst case" and about 96% faster on average.

There is room for more improvement, but I think these are good enough to push once I get the production DB migrated over to Postgres. I hope the speed changes carry over to the production system.

In summary;

  • Switched to Postgres
  • Added more indices
  • Rewrote some queries, simplifying them
  • Amortized some recursive code

Probably introduced some new bugs while I was at it, but at a glance I didn't notice anything glaring.

@douglasg14b
Copy link

douglasg14b commented Oct 13, 2021

Wow! Good work!

These are some quick wins that could be achieved, augmenting your success, without slogging too much more through SQL:

  1. Loading indicator
    • Effort: Easy AF
  2. Cache header
    • Effort: Easy AF
  3. Server-side Request Caching
    • Effort: Easy AF -> Easy (Depending on framework)
  4. Server Side Data Caching
    • Effort: Easy AF -> Medium (Depending on framework)
  5. Generate static HTML/JSON for the docs
    • Effort: Medium -> Hard (Depending on framework)
  6. Materialized View (Probably a dumb suggestion)?
  7. Postgres Query Cache?

Using just 3 || 4 means that response times could be a few ms across the board for docs without much cost aside from a small memory footprint. Augment that with cache headers, and some 304 Not Modified results, and even the cache will get bored :p. Unfortunately I don't know much about the hosting setup, at least in asp.net core caching this sort of stuff is just a couple lines, it might be more difficult with w/e lang & framework is in use here.

The jist is that this appears to be fairly static data that's being generated on the fly at significant cost, why not cache it?

@mholt
Copy link
Member

mholt commented Oct 13, 2021

Yeah, good ideas @douglasg14b .

  1. Yep, this should be easy, I will see if I can whip one up. This is also something the community can contribute if they have better front-end skillz.
  2. We should probably do this, I'll experiment with setting this up.

3-5 not really interested in doing; at least not now.

  1. would just slow down the UI more. I've never interacted with a Material UI that was faster than a "normal" UI.
  2. Maybe, I don't know about this, I'm only on day 3 of using Postgres so I'd have to learn about this.

@douglasg14b
Copy link

douglasg14b commented Oct 13, 2021

To be clear, number 6 is a materialized view in postgres. Not a change in site CSS, which wouldn't have any bearing on this problem.

Number 3 || 4 would be best tackled by someone familiar with the API environment, but would probably provide the biggest improvement. I'm on my phone so this is kind of difficult to ascertain, but do you know what lang/framework is used for the backend?

Congrats on making such progress on day 3 of postgres 😮

@mholt
Copy link
Member

mholt commented Oct 13, 2021

Oh, I see, I hadn't heard of materialized views before.

I've finished migrating the production DB to Postgres and have deployed the optimizations in the code. I'm observing about a 98% speedup over the worst 10s (or higher) load time offenders. For example, this loads in about 400ms for me: https://caddyserver.com/docs/json/apps/http/servers/routes/handle/reverse_proxy/

I'll see about adding a Cache-Control header next, as that's a really easy win as well. (Update: done)

Still room for improvement, but for now that will have to do, so I can work on other things. The community is welcome to contribute a loading indicator.

@mholt mholt closed this as completed Oct 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants