Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate retries when using Chromium #233

Open
szmarczak opened this issue Feb 1, 2020 · 12 comments
Open

Duplicate retries when using Chromium #233

szmarczak opened this issue Feb 1, 2020 · 12 comments
Labels
bug Something isn't working external help wanted Extra attention is needed

Comments

@szmarczak
Copy link
Collaborator

szmarczak commented Feb 1, 2020

Test:

test.only('retry with body', withPage, async (t, page) => {
	let requestCount = 0;

	const server = await createTestServer();
	server.get('/', (request, response) => {
		response.end('zebra');
	});

	server.put('/test', async (request, response) => {
		console.log(Date.now());
		requestCount++;
		await pBody(request);
		response.sendStatus(408);
	});

	await page.goto(server.url);
	await page.addScriptTag({path: './umd.js'});

	await new Promise(resolve => page.on('close', resolve));
	t.is(requestCount, 1);

	await server.close();
});

with-page.js:

export default async function withPage(t, run) {
	const browser = await puppeteer.launch({
		headless: false
	});
	const page = await browser.newPage();
	try {
		await run(t, page);
	} finally {
		await page.close().catch(() => {});
		await browser.close().catch(() => {});
	}
}

When you run the following code, you will notice that it will receive three requests (3 different timestamps, there is a ~5ms difference between them). Seems like Chromium (including Google Chrome) is doing retries too.

globalThis.fetch(`${location.href}test`, {
	method: 'PUT',
	body: 'a'
});

Related:

@szmarczak szmarczak added bug Something isn't working help wanted Extra attention is needed labels Feb 1, 2020
@szmarczak
Copy link
Collaborator Author

szmarczak commented Feb 1, 2020

The "proper" solution would be to check all the HTTP error codes for both Google Chrome and Firefox and see when it retries.

Not only that, we need to check if the requestCount is right when getting 413:

ky/test/retry.js

Lines 86 to 148 in 78814e7

test('respect 413 Retry-After', async t => {
let requestCount = 0;
const server = await createTestServer();
server.get('/', (request, response) => {
requestCount++;
if (requestCount === defaultRetryCount) {
response.end((Date.now() - lastTried413access).toString());
} else {
response.writeHead(413, {
'Retry-After': retryAfterOn413
});
response.end('');
}
});
const result = await ky(server.url).text();
t.true(Number(result) >= retryAfterOn413 * 1000);
await server.close();
});
test('respect 413 Retry-After with timestamp', async t => {
let requestCount = 0;
const server = await createTestServer();
server.get('/', (request, response) => {
requestCount++;
if (requestCount === defaultRetryCount) {
response.end((Date.now() - lastTried413access).toString());
} else {
const date = (new Date(Date.now() + (retryAfterOn413 * 1000))).toUTCString();
response.writeHead(413, {
'Retry-After': date
});
response.end('');
}
});
const result = await ky(server.url).text();
t.true(Number(result) >= retryAfterOn413 * 1000);
t.is(requestCount, 2);
await server.close();
});
test('doesn\'t retry on 413 without Retry-After header', async t => {
let requestCount = 0;
const server = await createTestServer();
server.get('/', (request, response) => {
requestCount++;
response.sendStatus(413);
});
await t.throwsAsync(ky(server.url).text(), /Payload Too Large/);
t.is(requestCount, 1);
await ky(server.url, {throwHttpErrors: false}).text();
t.is(requestCount, 2);
await server.close();
});

Honestly, I'd consider this a Google Chrome Chromium (including Google Chrome) bug since fetch is called only once.

@sholladay
Copy link
Collaborator

sholladay commented Feb 2, 2020

Note that I was only able to reproduce this with Puppeteer driving Chromium. In my testing, normal Chrome without Puppeteer didn't display this behavior. I did not try Chromium without Puppeteer, or Puppeteer driving Chrome. We should try those next to to rule out it being a difference between Chromium and Chrome. If Chromium on its own does not display this behavior, then it likely has something to do with Puppeteer, which would be a relief.

@szmarczak szmarczak changed the title Duplicate retries when using Google Chrome Duplicate retries when using Chromium Feb 3, 2020
@szmarczak
Copy link
Collaborator Author

szmarczak commented Feb 3, 2020

@sholladay Yup, you're right. Doesn't happen for Google Chrome. I'll try to figure out why.

@szmarczak
Copy link
Collaborator Author

It also happens for Google Chrome too. If you run fetch once, you will notice that there is only 1 requests in total. If you run fetch again, you will notice that there are 3 (!) requests in total.

@szmarczak
Copy link
Collaborator Author

szmarczak commented Feb 3, 2020

@szmarczak
Copy link
Collaborator Author

As you can see, it's natively implemented: https://chromium.googlesource.com/chromium/src/+/refs/tags/78.0.3904.130/net/http/http_network_transaction.cc#1096

That means, we can't disable it :(

@szmarczak
Copy link
Collaborator Author

Search Chromium for "resend"
ShouldResendRequest: so it seems it does only so for keep-alive requests

@szmarczak
Copy link
Collaborator Author

I say let's stay with the current behavior, but just do

console.warn('Ky: Duplicate retries may be present as Chromium has implemented retries natively for `408 Request Timeout` errors')

when retrying on 408 on Chromium.

@sholladay
Copy link
Collaborator

@bthallion do you know anyone at Google who could look at this? Chromium is implicitly retrying network requests, which makes it hard for fetch libraries to coordinate retries. And IIRC from my testing it's even doing so for POST requests, which are non-idempotent (!)

In my opinion, it would be pretty awesome for the browser to handle retries, so long as it's baked into the spec and can be customized via fetch options. Doing so implicitly and only in certain cases is just plain awful, though.

@sindresorhus
Copy link
Owner

Open a Chrome bug (https://bugs.chromium.org/p/chromium/issues/list) and I'll try to get it routed to the right person.

It might be worth opening an issue on Fetch about this too: https://github.com/whatwg/fetch To get the spec to clarify retry behavior. Either to explicitly allow it, make implementation defined, or to disallow it. I would first check that it's not already mentioned in the spec.

@szmarczak
Copy link
Collaborator Author

I would first check that it's not already mentioned in the spec.

Have done that already. No mention unfortunately :(

@szmarczak
Copy link
Collaborator Author

The workaround would be to set the keepalive option to false.

This is worth looking at: whatwg/fetch#679

FrankHassanabad added a commit to elastic/kibana that referenced this issue Mar 15, 2021
…REST backend (#94531)

## Summary

Increases the pre-packaged socket timeout and chunks the requests. Existing e2e tests should cover the changes. Interesting enough, when the server sends back a 408, Chrome will re-send the same request again which can cause socket/network saturations. By increasing the timeout, Chrome will not resend the same request again on timeout.

Right now, there is not a way to increase the timeouts for the alerting framework/saved objects as far as I know for connections. That would be an additional safety measure in additional to doing chunked requests. Chunked requests will ensure that the pre-packaged rule does not exhaust ephemeral ports and limit the concurrent requests. 

See this issue talked about below:
sindresorhus/ky#233
https://groups.google.com/a/chromium.org/g/chromium-dev/c/urswDsm6Pe0
https://medium.com/@lighthopper/connection-retry-schedule-in-chrome-browser-a9c814b7dc20

**Manual testing**
You can bump up the rule version numbers manually through a search and replace and then install them. You can add a `console.trace()` to the backend and slow down the requests to ensure they are not happening more than once. 

```
Trace: 
    at updatePrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/rules/update_prepacked_rules.ts:34:11)
    at createPrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:140:9)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at /Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:66:27
    at Router.handle (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:272:30)
    at handler (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:227:11)
    at exports.Manager.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
    at Object.internals.handler (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20)
    at exports.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20)
    at Request._lifecycle (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:371:32)
    at Request._execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:279:9)
```

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Mar 15, 2021
…REST backend (elastic#94531)

## Summary

Increases the pre-packaged socket timeout and chunks the requests. Existing e2e tests should cover the changes. Interesting enough, when the server sends back a 408, Chrome will re-send the same request again which can cause socket/network saturations. By increasing the timeout, Chrome will not resend the same request again on timeout.

Right now, there is not a way to increase the timeouts for the alerting framework/saved objects as far as I know for connections. That would be an additional safety measure in additional to doing chunked requests. Chunked requests will ensure that the pre-packaged rule does not exhaust ephemeral ports and limit the concurrent requests. 

See this issue talked about below:
sindresorhus/ky#233
https://groups.google.com/a/chromium.org/g/chromium-dev/c/urswDsm6Pe0
https://medium.com/@lighthopper/connection-retry-schedule-in-chrome-browser-a9c814b7dc20

**Manual testing**
You can bump up the rule version numbers manually through a search and replace and then install them. You can add a `console.trace()` to the backend and slow down the requests to ensure they are not happening more than once. 

```
Trace: 
    at updatePrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/rules/update_prepacked_rules.ts:34:11)
    at createPrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:140:9)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at /Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:66:27
    at Router.handle (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:272:30)
    at handler (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:227:11)
    at exports.Manager.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
    at Object.internals.handler (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20)
    at exports.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20)
    at Request._lifecycle (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:371:32)
    at Request._execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:279:9)
```

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
FrankHassanabad added a commit to FrankHassanabad/kibana that referenced this issue Mar 15, 2021
…REST backend (elastic#94531)

Increases the pre-packaged socket timeout and chunks the requests. Existing e2e tests should cover the changes. Interesting enough, when the server sends back a 408, Chrome will re-send the same request again which can cause socket/network saturations. By increasing the timeout, Chrome will not resend the same request again on timeout.

Right now, there is not a way to increase the timeouts for the alerting framework/saved objects as far as I know for connections. That would be an additional safety measure in additional to doing chunked requests. Chunked requests will ensure that the pre-packaged rule does not exhaust ephemeral ports and limit the concurrent requests.

See this issue talked about below:
sindresorhus/ky#233
https://groups.google.com/a/chromium.org/g/chromium-dev/c/urswDsm6Pe0
https://medium.com/@lighthopper/connection-retry-schedule-in-chrome-browser-a9c814b7dc20

**Manual testing**
You can bump up the rule version numbers manually through a search and replace and then install them. You can add a `console.trace()` to the backend and slow down the requests to ensure they are not happening more than once.

```
Trace:
    at updatePrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/rules/update_prepacked_rules.ts:34:11)
    at createPrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:140:9)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at /Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:66:27
    at Router.handle (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:272:30)
    at handler (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:227:11)
    at exports.Manager.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
    at Object.internals.handler (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20)
    at exports.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20)
    at Request._lifecycle (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:371:32)
    at Request._execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:279:9)
```

Delete any items that are not applicable to this PR.

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
FrankHassanabad added a commit to elastic/kibana that referenced this issue Mar 15, 2021
…of the REST backend (#94531) (#94607)

* Updated to allow chunked queries and to increase the timeouts of the REST backend (#94531)

Increases the pre-packaged socket timeout and chunks the requests. Existing e2e tests should cover the changes. Interesting enough, when the server sends back a 408, Chrome will re-send the same request again which can cause socket/network saturations. By increasing the timeout, Chrome will not resend the same request again on timeout.

Right now, there is not a way to increase the timeouts for the alerting framework/saved objects as far as I know for connections. That would be an additional safety measure in additional to doing chunked requests. Chunked requests will ensure that the pre-packaged rule does not exhaust ephemeral ports and limit the concurrent requests.

See this issue talked about below:
sindresorhus/ky#233
https://groups.google.com/a/chromium.org/g/chromium-dev/c/urswDsm6Pe0
https://medium.com/@lighthopper/connection-retry-schedule-in-chrome-browser-a9c814b7dc20

**Manual testing**
You can bump up the rule version numbers manually through a search and replace and then install them. You can add a `console.trace()` to the backend and slow down the requests to ensure they are not happening more than once.

```
Trace:
    at updatePrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/rules/update_prepacked_rules.ts:34:11)
    at createPrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:140:9)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at /Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:66:27
    at Router.handle (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:272:30)
    at handler (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:227:11)
    at exports.Manager.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
    at Object.internals.handler (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20)
    at exports.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20)
    at Request._lifecycle (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:371:32)
    at Request._execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:279:9)
```

Delete any items that are not applicable to this PR.

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

* Wrong import as alerting is now called alerts. bad merge
kibanamachine added a commit to elastic/kibana that referenced this issue Mar 15, 2021
…REST backend (#94531) (#94587)

## Summary

Increases the pre-packaged socket timeout and chunks the requests. Existing e2e tests should cover the changes. Interesting enough, when the server sends back a 408, Chrome will re-send the same request again which can cause socket/network saturations. By increasing the timeout, Chrome will not resend the same request again on timeout.

Right now, there is not a way to increase the timeouts for the alerting framework/saved objects as far as I know for connections. That would be an additional safety measure in additional to doing chunked requests. Chunked requests will ensure that the pre-packaged rule does not exhaust ephemeral ports and limit the concurrent requests. 

See this issue talked about below:
sindresorhus/ky#233
https://groups.google.com/a/chromium.org/g/chromium-dev/c/urswDsm6Pe0
https://medium.com/@lighthopper/connection-retry-schedule-in-chrome-browser-a9c814b7dc20

**Manual testing**
You can bump up the rule version numbers manually through a search and replace and then install them. You can add a `console.trace()` to the backend and slow down the requests to ensure they are not happening more than once. 

```
Trace: 
    at updatePrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/rules/update_prepacked_rules.ts:34:11)
    at createPrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:140:9)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at /Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:66:27
    at Router.handle (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:272:30)
    at handler (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:227:11)
    at exports.Manager.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
    at Object.internals.handler (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20)
    at exports.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20)
    at Request._lifecycle (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:371:32)
    at Request._execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:279:9)
```

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

Co-authored-by: Frank Hassanabad <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working external help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants