Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(#575): retry failed HTTP requests #583

Merged
merged 22 commits into from
Dec 6, 2023
Merged

feat(#575): retry failed HTTP requests #583

merged 22 commits into from
Dec 6, 2023

Conversation

m5r
Copy link
Member

@m5r m5r commented Oct 25, 2023

Description

Leverage async-retry to retry failed HTTP requests

#575

Code review items

  • Readable: Concise, well named, follows the style guide, documented if necessary.
  • Documented: Configuration and user documentation on cht-docs
  • Tested: Unit and/or integration tests where appropriate
  • Backwards compatible: Works with existing data and configuration. Any breaking changes documented in the release notes.

License

The software is provided under AGPL-3.0. Contributions to this project are accepted under the same license.

@m5r m5r force-pushed the 575-retry-on-failure branch 6 times, most recently from 2e543f9 to 0866938 Compare November 28, 2023 14:48
@m5r m5r marked this pull request as ready for review November 29, 2023 22:47
@m5r m5r changed the title Retry failed requests feat(#575): retry failed HTTP requests Nov 29, 2023
Copy link
Contributor

@jkuester jkuester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur with Gareth's comment that this is probably a more viable approach than #590.

The implementation here looks great! I had a few questions regarding the tests, particular around trying to avoid having to increase the timeone on so many tests. Once those are resolved this is good to go!

@@ -7,6 +8,13 @@ const url = require('url');

const cache = new Map();

const _request = (method) => (...args) => retry(() => rpn[method](...args), { retries: 5, randomize: false, factor: 1.5 });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, but what was the reason for factor: 1.5 instead of just using the default 2? I guess this will make it retry faster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the default config chooses a random factor between 1 and 2 for each retry. Worst case scenario was making the user wait >60 seconds when all 6 requests failed so making it retry faster and consistent across failures with no random made more sense

mockRequest.onCall(0).resolves([]);
mockRequest.onCall(1).rejects({ error: 'random' });
mockRequest.get.onCall(0).resolves([]);
mockRequest.put.onCall(0).resolves({ ok: true });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this actually reject with an error? (That is what the test was originally doing....)

I think this test might have unintentionally been broken since there is no expect.fail at the end of the try-block to make sure an exception was actually thrown...

Comment on lines +33 to +35
api = rewire('../../src/lib/api');
warnUploadOverwrite = rewire('../../src/lib/warn-upload-overwrite');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to rewire these again here in the afterEach?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the fun part of my latest problems with cht-conf tests...

In https://github.com/medic/cht-conf/blob/575-retry-on-failure/src/lib/api.js#L9 you will notice this on line 9 const cache = new Map();.
This map basically caches some common responses from the API that can be reused between cht-conf commands like how some things in couch are configured. It took me a while to figure this out but this cache variable wouldn't get reset across tests, the map was still populated with the API response the first test had cached and it was causing tests in this file to fail.
They were not failing before because the API response mocks were incomplete and it didn't matter much because the command could keep going without this information. But with the retry mechanism, a lack of API response means it gets retried a couple times before moving on and that made a lot of tests run over the timeout.

Back to your question, we need to rewire these because we need to reset the cache variable. I tried with just api.__set__('cache', new Map()); in the beforeEach but it wasn't enough to reset it. So we rewire api and then we rewire warnUploadOverwrite to give it the newly rewired api.

test/lib/warn-upload-overwrite.spec.js Show resolved Hide resolved
test/lib/warn-upload-overwrite.spec.js Show resolved Hide resolved
Comment on lines 144 to 172
api.giveResponses(
{
status: 200,
body: { version: '3.5.0' },
},
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add this? The tests run fine for me locally without it....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It times out locally without this mocked response, my guess is it's because it calls const version = await getValidApiVersion(); on line 171 https://github.com/medic/cht-conf/blob/575-retry-on-failure/src/fn/upload-custom-translations.js#L171

@@ -12,7 +12,9 @@ const userPrompt = rewire('../../src/lib/user-prompt');
const readLine = require('readline-sync');
const mockTestDir = testDir => sinon.stub(environment, 'pathToProject').get(() => testDir);

describe('create-users', () => {
describe('create-users', function () {
this.timeout(15000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I understand that these tests are kind of all over the place in terms of how they are a wierd mix of unit and integration tests (hitting the api stub). However, having all these various tests trip the async-retry logic is super inefficient and that time is wasted on every test run.

To me, it seems that the 'proper' fix would be to actually separate the tests out so that we have a few integration tests that call through to the api stub, but the rest are unit tests that just hit a mock of the api.js (and never actually make any HTTP calls at all...). But, I don't know that we want to try and take a big refactor of the tests on at this point. So, here is my compromise suggestion:

Do you think it makes sense in these tests to override the request property in the api.js (similar to what you did in api.spec.js), but for these tests just have it directly call rpn instead of wrapping it in the retry? That should allow these tests to continue functioning as they did originally, but without needing the extra time/setup for a bunch of retries....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you here, I was a bit disappointed to cause the tests to take so much longer (going from ~1 minute to ~7 minutes because of the repeated timeouts) but I think I've found a reasonable compromise.
I went ahead and overrode request to directly all rpn in the related tests and I added a few tests to cover the retry mechanism in the api tests that will confidently tell us if we happen to break this feature

…lations > medic-3.x > 3.0.0` conflicting with `request` mock in `warn-upload-overwrite > prompts when attempting to overwrite docs > shows diff when local is different from remote and the user requests a diff`
@m5r m5r merged commit bc5b0bc into main Dec 6, 2023
12 checks passed
@m5r m5r deleted the 575-retry-on-failure branch December 6, 2023 17:08
@medic-ci
Copy link
Collaborator

medic-ci commented Dec 6, 2023

🎉 This PR is included in version 3.21.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants