-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(gatsby-source-drupal): Use Got instead of Axios for retries & caching support + restrict to 20 concurrent requests #31514
Conversation
…ching support + restrict to 10 concurrent requests Got is a lot more robust http library so let's use that instead of Axios. For a very small local test site, this drops sourcing time from ~4s to ~0.4s. Increasing the concurrency to 20 from 10 regresses back to 4s. So somewhere between 10 and 20 concurrent requests, my local setup gets grumpy.
@KyleAMathews I think Drupal is bottlenecking at the database. There's only so many database interactions it can do before the locking mechanism creates a serious overhead. Would be good to understand the correlation so as to provide good guidelines to concurrency settings but 10 seems like a sane standard. |
makes sense! Yeah 10 is clearly much faster than uncapped so we can ship that as a default and do more experimentation to see if there's scenarios where more concurrent downloads is faster e.g. a heavily cached Drupal site on Pantheon. |
@KyleAMathews As you mentioned, with a warm cache these changes help the source plugin go way faster. Any performance / reliability improvement for sourcing is a big win so this is worth getting in even if it's just 10% faster with a cold cache. Also with standardization of the request library to I would like to test this with a Pantheon Drupal + Varnish setup with high concurrency as we had some issues with requests getting dropped in the past and running into |
Got does retries automatically for On concurrency — it seems like there's a large sweet spot for http requests & concurrency. I got roughly the same throughput for anywhere between 3-4 concurrent requests and like ~30. I was testing with Pantheon so I assume it's using Varnish by default? |
Unless my testing setup was incorrect, it seems like a single node process + varnish hits some fundamental limit with 5 concurrent requests. I could just be hitting e.g. the network capacity of my local network. I'm pairing off my phone right now and it's sourcing at about 40% less than it was on my work network which suggests network capacity is the most common constraint (good news for CI servers which generally have very fat network pipes) |
I think it will depend mostly on the size of your Pantheon (or other cloud provider) plan. You have access to more PHP processes with higher plans which would directly impact upper limit on concurrency. Caching will also determine whether concurrency has an impact on performance since throughput will be bottlenecked on Drupal side with a cold cache but Varnish usually manages to handle a high number of warm requests. I only recommend we make it configurable so we can support a range of different size sites. |
…' testing, bump default concurrency to 20 (it was 2x faster for his site vs. 5)
@Auspicus tested concurrency and did find a big (2x) improvement to sourcing speed by upping concurrency to 20. So changed that to the default and added an option so people can test changes to concurrency easily. |
Got is a lot more robust http library so let's use that instead of Axios.
Got's big benefit here is that it automatically retries failed requests so one failed API request doesn't stop the build.
This PR also adds keep-alive support which speeds up requests as it means we only setup the https connection once.
To limit concurrency of requests to avoid overloading the Drupal server, we push requests into a fastq queue. From testing, 20 concurrent requests seems fastest.
When sourcing from a Drupal site with a cleared cache, this PR is around ~10% faster than master e.g. 30s to 27s. With a warm Drupal cache (but nothing in the source plugin cache), it's around 20% faster. If both Drupal and the source plugin's cache is warm, it's now 75% faster 🎉
I hadn't realized before this PR just how big of difference Drupal's cache makes to JSON API request speed. There's literally a 10x performance cliff between the two. With a warm cache, Gatsby can source around 1000-1250 entities / second. With a cold cache, it's only 125 entities / second. @Auspicus pointed out this Drupal issue https://www.drupal.org/project/drupal/issues/3090131 which changes the JSON API cache to be a lot more robust as now, it's deleted every time any entity is changed. So when doing full sources from Drupal, we'll often be in slow-mode.