feat(gatsby-source-drupal): Use Got instead of Axios for retries & caching support + restrict to 20 concurrent requests #31514

KyleAMathews · 2021-05-20T23:52:14Z

Got is a lot more robust http library so let's use that instead of Axios.

Got's big benefit here is that it automatically retries failed requests so one failed API request doesn't stop the build.

This PR also adds keep-alive support which speeds up requests as it means we only setup the https connection once.

To limit concurrency of requests to avoid overloading the Drupal server, we push requests into a fastq queue. From testing, 20 concurrent requests seems fastest.

When sourcing from a Drupal site with a cleared cache, this PR is around ~10% faster than master e.g. 30s to 27s. With a warm Drupal cache (but nothing in the source plugin cache), it's around 20% faster. If both Drupal and the source plugin's cache is warm, it's now 75% faster 🎉

I hadn't realized before this PR just how big of difference Drupal's cache makes to JSON API request speed. There's literally a 10x performance cliff between the two. With a warm cache, Gatsby can source around 1000-1250 entities / second. With a cold cache, it's only 125 entities / second. @Auspicus pointed out this Drupal issue https://www.drupal.org/project/drupal/issues/3090131 which changes the JSON API cache to be a lot more robust as now, it's deleted every time any entity is changed. So when doing full sources from Drupal, we'll often be in slow-mode.

…ching support + restrict to 10 concurrent requests Got is a lot more robust http library so let's use that instead of Axios. For a very small local test site, this drops sourcing time from ~4s to ~0.4s. Increasing the concurrency to 20 from 10 regresses back to 4s. So somewhere between 10 and 20 concurrent requests, my local setup gets grumpy.

Auspicus · 2021-05-21T01:18:34Z

@KyleAMathews I think Drupal is bottlenecking at the database. There's only so many database interactions it can do before the locking mechanism creates a serious overhead. Would be good to understand the correlation so as to provide good guidelines to concurrency settings but 10 seems like a sane standard.

KyleAMathews · 2021-05-21T02:26:00Z

There's only so many database interactions it can do before the locking mechanism creates a serious overhead

makes sense! Yeah 10 is clearly much faster than uncapped so we can ship that as a default and do more experimentation to see if there's scenarios where more concurrent downloads is faster e.g. a heavily cached Drupal site on Pantheon.

…ed sourcing

packages/gatsby-source-drupal/src/gatsby-node.js

Auspicus · 2021-06-17T22:20:32Z

@KyleAMathews As you mentioned, with a warm cache these changes help the source plugin go way faster. Any performance / reliability improvement for sourcing is a big win so this is worth getting in even if it's just 10% faster with a cold cache. Also with standardization of the request library to got we can hopefully encourage more people to hop in and take a look since it will follow a similar structure to the other source plugins.

I would like to test this with a Pantheon Drupal + Varnish setup with high concurrency as we had some issues with requests getting dropped in the past and running into ECONNRESET errors. AFAICT this doesn't add a requeue / exponential backoff mechanism for failed requests. Is that something you want to handle separately?

KyleAMathews · 2021-06-17T22:38:45Z

Got does retries automatically for ECONNREST!

On concurrency — it seems like there's a large sweet spot for http requests & concurrency. I got roughly the same throughput for anywhere between 3-4 concurrent requests and like ~30. I was testing with Pantheon so I assume it's using Varnish by default?

KyleAMathews · 2021-06-17T22:49:11Z

Unless my testing setup was incorrect, it seems like a single node process + varnish hits some fundamental limit with 5 concurrent requests. I could just be hitting e.g. the network capacity of my local network. I'm pairing off my phone right now and it's sourcing at about 40% less than it was on my work network which suggests network capacity is the most common constraint (good news for CI servers which generally have very fat network pipes)

Auspicus · 2021-06-17T22:49:45Z

Got does retries automatically for ECONNREST!

On concurrency — it seems like there's a large sweet spot for http requests & concurrency. I got roughly the same throughput for anywhere between 3-4 concurrent requests and like ~30. I was testing with Pantheon so I assume it's using Varnish by default?

I think it will depend mostly on the size of your Pantheon (or other cloud provider) plan. You have access to more PHP processes with higher plans which would directly impact upper limit on concurrency. Caching will also determine whether concurrency has an impact on performance since throughput will be bottlenecked on Drupal side with a cold cache but Varnish usually manages to handle a high number of warm requests. I only recommend we make it configurable so we can support a range of different size sites.

…' testing, bump default concurrency to 20 (it was 2x faster for his site vs. 5)

KyleAMathews · 2021-06-18T00:27:46Z

@Auspicus tested concurrency and did find a big (2x) improvement to sourcing speed by upping concurrency to 20. So changed that to the default and added an option so people can test changes to concurrency easily.

gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label May 20, 2021

fix lint

abc72a1

KyleAMathews added topic: source-drupal Related to Gatsby's integration with Drupal and removed status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer labels May 21, 2021

Update yarn.lock

f2e6aa4

Add keep-alive http agent

e3848fa

KyleAMathews requested a review from smthomas May 21, 2021 02:26

KyleAMathews added 6 commits May 21, 2021 10:20

Merge branch 'master' into got-drupal

317c637

Fix tests

37baebd

yarn.lock updates

0f506d8

WIP

a569bbe

5 seems to the most stable value for concurrency & fastest for uncach…

1f1e7f2

…ed sourcing

Merge remote-tracking branch 'origin/master' into got-drupal

6666858

KyleAMathews changed the title ~~feat(gatsby-source-drupal): Use Got instead of Axios for retries & caching support + restrict to 10 concurrent requests~~ feat(gatsby-source-drupal): Use Got instead of Axios for retries & caching support + restrict to 5 concurrent requests Jun 17, 2021

KyleAMathews added 4 commits June 17, 2021 11:54

Update yarn.lock

63a6d26

Update renovate.json

1597055

Remove logging code

41a3d18

revert some unwanted changes

1a9d72d

Auspicus reviewed Jun 17, 2021

View reviewed changes

packages/gatsby-source-drupal/src/gatsby-node.js Outdated Show resolved Hide resolved

Add option for setting concurrency for API requests. Based on @Auspicus…

7c9f2fb

…' testing, bump default concurrency to 20 (it was 2x faster for his site vs. 5)

Use Gatsby's cache so it's persisted to disk in between builds

10e9f87

KyleAMathews changed the title ~~feat(gatsby-source-drupal): Use Got instead of Axios for retries & caching support + restrict to 5 concurrent requests~~ feat(gatsby-source-drupal): Use Got instead of Axios for retries & caching support + restrict to 20 concurrent requests Jun 18, 2021

Rutam21 mentioned this pull request Dec 14, 2023

[Snyk] Fix for 1 vulnerabilities Rutam21/gatsby#1583

Open

monizb mentioned this pull request Dec 14, 2023

[Snyk] Fix for 1 vulnerabilities monizb/gatsby#881

Open

Rutam21 mentioned this pull request Dec 14, 2023

[Snyk] Fix for 1 vulnerabilities Rutam21/gatsby#1584

Open

nidhi42 mentioned this pull request Dec 14, 2023

[Snyk] Fix for 1 vulnerabilities nidhi42/gatsby#2800

Open

kaocher82 mentioned this pull request Dec 15, 2023

[Snyk] Fix for 1 vulnerabilities Xtuden-com/gatsby#4817

Open

wizbit mentioned this pull request Dec 15, 2023

[Snyk] Fix for 1 vulnerabilities signal-noise/gatsby#472

Open

kaocher82 mentioned this pull request Dec 15, 2023

[Snyk] Fix for 1 vulnerabilities Xtuden-com/gatsby#4836

Open

saurabharch mentioned this pull request Dec 15, 2023

[Snyk] Fix for 1 vulnerabilities saurabharch/gatsby#2529

Open

wizbit mentioned this pull request Dec 15, 2023

[Snyk] Fix for 1 vulnerabilities signal-noise/gatsby#485

Open

joeawillis mentioned this pull request Dec 15, 2023

[Snyk] Fix for 1 vulnerabilities signal-noise/gatsby#496

Open

MaxMood96 mentioned this pull request Dec 15, 2023

[Snyk] Fix for 1 vulnerabilities MaxMood96/gatsby#1271

Open

saurabharch mentioned this pull request Dec 15, 2023

[Snyk] Fix for 1 vulnerabilities saurabharch/gatsby#2537

Open

joeawillis mentioned this pull request Dec 15, 2023

[Snyk] Fix for 1 vulnerabilities signal-noise/gatsby#514

Open

kaocher82 mentioned this pull request Jan 5, 2024

[Snyk] Fix for 1 vulnerabilities Xtuden-com/gatsby#4913

Open

tonyg101 mentioned this pull request Jan 5, 2024

[Snyk] Fix for 1 vulnerabilities signal-noise/gatsby#588

Open

nidhi42 mentioned this pull request Jan 5, 2024

[Snyk] Fix for 1 vulnerabilities nidhi42/gatsby#2919

Open

monizb mentioned this pull request Jan 5, 2024

[Snyk] Fix for 1 vulnerabilities monizb/gatsby#995

Open

impactyogi mentioned this pull request Jan 5, 2024

[Snyk] Fix for 1 vulnerabilities signal-noise/gatsby#616

Open

wizbit mentioned this pull request Jan 5, 2024

[Snyk] Fix for 1 vulnerabilities signal-noise/gatsby#618

Open

nidhi42 mentioned this pull request Jan 6, 2024

[Snyk] Fix for 1 vulnerabilities nidhi42/gatsby#2972

Open

kaocher82 mentioned this pull request Jan 6, 2024

[Snyk] Fix for 1 vulnerabilities Xtuden-com/gatsby#4996

Open

This was referenced Jan 6, 2024

[Snyk] Fix for 1 vulnerabilities Rutam21/gatsby#1716

Open

[Snyk] Fix for 1 vulnerabilities Rutam21/gatsby#1718

Open

saurabharch mentioned this pull request Jan 6, 2024

[Snyk] Fix for 1 vulnerabilities saurabharch/gatsby#2623

Open

monizb mentioned this pull request Jan 6, 2024

[Snyk] Fix for 1 vulnerabilities monizb/gatsby#1042

Open

kaocher82 mentioned this pull request Jan 6, 2024

[Snyk] Fix for 1 vulnerabilities Xtuden-com/gatsby#5017

Open

tonyg101 mentioned this pull request Jan 6, 2024

[Snyk] Fix for 1 vulnerabilities signal-noise/gatsby#674

Open

saurabharch mentioned this pull request Jan 6, 2024

[Snyk] Fix for 1 vulnerabilities saurabharch/gatsby#2631

Open

kaocher82 mentioned this pull request Jan 6, 2024

[Snyk] Fix for 1 vulnerabilities Xtuden-com/gatsby#5041

Open

MaxMood96 mentioned this pull request Jan 6, 2024

[Snyk] Fix for 1 vulnerabilities MaxMood96/gatsby#1333

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gatsby-source-drupal): Use Got instead of Axios for retries & caching support + restrict to 20 concurrent requests #31514

feat(gatsby-source-drupal): Use Got instead of Axios for retries & caching support + restrict to 20 concurrent requests #31514

KyleAMathews commented May 20, 2021 •

edited

Loading

Auspicus commented May 21, 2021

KyleAMathews commented May 21, 2021

Auspicus commented Jun 17, 2021

KyleAMathews commented Jun 17, 2021

KyleAMathews commented Jun 17, 2021 •

edited

Loading

Auspicus commented Jun 17, 2021 •

edited

Loading

KyleAMathews commented Jun 18, 2021

feat(gatsby-source-drupal): Use Got instead of Axios for retries & caching support + restrict to 20 concurrent requests #31514

feat(gatsby-source-drupal): Use Got instead of Axios for retries & caching support + restrict to 20 concurrent requests #31514

Conversation

KyleAMathews commented May 20, 2021 • edited Loading

Auspicus commented May 21, 2021

KyleAMathews commented May 21, 2021

Auspicus commented Jun 17, 2021

KyleAMathews commented Jun 17, 2021

KyleAMathews commented Jun 17, 2021 • edited Loading

Auspicus commented Jun 17, 2021 • edited Loading

KyleAMathews commented Jun 18, 2021

KyleAMathews commented May 20, 2021 •

edited

Loading

KyleAMathews commented Jun 17, 2021 •

edited

Loading

Auspicus commented Jun 17, 2021 •

edited

Loading