keepalive: Do we need to restrict the number of requests at a time? #662

yutakahirano · 2018-01-15T13:28:35Z

We are planning to implement and ship the keepalive flag. At the intent-to-ship thread, some people want to restrict the number of requests at a time, as well as the total number of inflight bytes. Do you think it's reasonable?

@yoavweiss @annevk @igrigorik

annevk · 2018-01-15T14:35:39Z

We could I suppose. Did sendBeacon() originally have that?

Note that we already have a restriction on the total number of bytes.

yoavweiss · 2018-01-15T14:43:52Z

sendBeacon() doesn't have a limitation on the number of requests in spec (nor in implementations as far as I know).

annevk · 2018-01-15T15:07:14Z

I wonder if @davidben has suggestions for what the request limit should be.

I suppose we can also somewhat more aggressively send H2 RST_STREAM/H1 close connection once there is a response?

igrigorik · 2018-01-15T16:03:17Z

We are planning to implement and ship the keepalive flag. At the intent-to-ship thread, some people want to restrict the number of requests at a time, as well as the total number of inflight bytes.

To clarify, we already spec 64KB limit for inflight bytes.

sendBeacon() doesn't have a limitation on the number of requests in spec (nor in implementations as far as I know).

Indeed. Nor, afaik, has this ever come up as a problem based on current sendBeacon deployments. FWIW, while it's tempting to be really aggressive here, it's also important to recognize that if we bend this too far, developers will simply revert to their existing tricks — sync XHRs, spinning in while loops in click handlers, and so on.

@yutakahirano @davidben what's the motivation for limiting number of requests, in addition to existing restrictions?

sleevi · 2018-01-15T17:07:12Z

@igrigorik I chimed in on the blink-dev@ thread, but one obvious implication is that these connections can keep resources consumed that the user cannot otherwise free (short of restarting the browser), and is not attributable to the site (instead, it just looks like memory is being leaked)

Much in the way we limit the number of connections per origin (6) or total number of sockets (256), there's going to be an inherent limit. One of the risks/implications for sendBeacon() is that a slow sendBeacon() that doesn't time-out can otherwise interfere with the user's ability to browse, by keeping a connection active and engaged. Closing the renderer no longer serves as a way to free those connections (by freeing the associated request contexts, which are terminated at the end of the renderer lifecycle)

annevk · 2018-01-15T17:47:02Z

To be clear, and as I tried to say in the thread, I'm fine with adding a limit. I was just raising the point that you might not have enforced one thus far. Is 64 requests reasonable? 16? Something that's not a power of 2?

sleevi · 2018-01-15T17:55:22Z

Right, I think the non-enforcement was an oversight we should try to correct. In the best/worst case, we would limit to 256 parallel requests - but I’m not sure if we’d limit total pending. I think the limit may vary on device, and thus be implementation defined with advisory (SHOULD) guidance. At some point, these are consuming resources that are not associated with a browsing context, and thus we need overall limits and per-origin limits. Where do we think the upperbound of resource usage is? 1024KB seems a rather substantial amount of information to allow to be queued - but it depends on device and network, and allows quick consumption by a single origin/navigation context.

…

On Mon, Jan 15, 2018 at 6:47 PM Anne van Kesteren ***@***.***> wrote: To be clear, and as I tried to say in the thread, I'm fine with adding a limit. I was just raising the point that you might not have enforced one thus far. Is 64 requests reasonable? 16? Something that's not a power of 2? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#662 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABayJ3iEjLuTeUUmf5F-anugiaG_1qGIks5tK48XgaJpZM4ReaZf> .

annevk · 2018-01-15T18:04:23Z

Note that they can stuff information in headers and URLs too, so you might be looking at much more than 1 mebibyte.

sleevi · 2018-01-15T18:12:15Z

Doesn’t that make any limit on sendBeacon() silly if it doesn’t account for that? I agree, though, that we should carefully consider that. To clarify my concerns: Implementations are limited in the strategies they can employ - they can enqueue in memory, thus growing memory usage, or they can back to disk, consuming disk usage. In both cases, implementations will be bounded both by what is reasonable for their constraints and what does not impair user experience or expectations. However, I also acknowledge that any explicit quota creates issues both for interop and for visibility - interop is affected if developers cannot discover or detect the limits, while exposing that potentially reveals cross-origin information, such as the size of responses other origins are enqueuing. This applies to sendBeacon and keepalive, but I don’t think it applies to sync XHR - which is unfortunate in many ways.

…

On Mon, Jan 15, 2018 at 7:04 PM Anne van Kesteren ***@***.***> wrote: Note that they can stuff information in headers and URLs too, so you might be looking at much more than 1 mebibyte. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#662 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABayJ4s2Sxif98k1EYEFZZRQH467mn8oks5tK5MogaJpZM4ReaZf> .

igrigorik · 2018-01-16T06:55:02Z

Ryan, thanks for the additional context — makes sense.

Right, I think the non-enforcement was an oversight we should try to
correct. In the best/worst case, we would limit to 256 parallel requests -
but I’m not sure if we’d limit total pending.

I think the limit may vary on device, and thus be implementation defined
with advisory (SHOULD) guidance

Seems like a good addition. My hunch is that we could get away with a lower limit. On that note, as an option, we could consider a higher hard ('must') limit, with a "UA's may lower this limit in appropriate circumstance" or some such?

annevk · 2018-01-16T07:42:05Z

I'd prefer to have a per-document/worker lower bound (or maybe per origin is better), if possible, so developers know what they can normally rely on.

davidben · 2018-01-16T07:48:57Z

Users don't observe workers, just tabs or sites, and documents may spawn lots of workers, so that suggests not giving all workers a separate limit.

yutakahirano · 2018-01-16T07:57:33Z

SharedWorker and ServiceWorker can be associated with multiple documents so they'd be tricky. By the way, ServiceWorker is much stronger than fetch API + keepalive. Do we have any similar limitations on it?

sleevi · 2018-01-16T08:10:42Z

Part of it may be implementation specific, but the memory overhead of ServiceWorker is partially bounded by the out-of-process nature. I also thought we handled ServiceWorkers' (implementation-defined) limits in w3c/ServiceWorker#527 ?

yutakahirano · 2018-01-18T07:19:50Z

Per-document/worker limit is avoidable as @davidben says. per-origin limit is avoidable by creating many third-party iframes (www1.example.com/, www2.example.com/, ...). per-tab limit may be close to user's expectation but it's not working well with SharedWorker / ServiceWorker. I can think of two policies sounding somewhat sane to me.

Solely rely on timeout. When a user closes a tab, all network resources for the tab will be released after N seconds. There's no limit for the number of inflight keepalive requests.
Have a per-tab limit (i.e., M keepalive requests per tab including iframes / dedicated workers). keepalive requests are not supported in ServiceWorker and SharedWorker.

What do you think?

annevk · 2018-01-18T07:36:29Z

By per-tab, do you actually mean a unit of related browsing contexts? Otherwise window.open() / <a target=x> would be escapes (albeit somewhat guarded in most situations). It doesn't seem acceptable to not support them in service workers though. How would you pass such a request through otherwise?

yutakahirano · 2018-01-18T08:20:31Z

Then what do you think about timeout? Users expect all network resources for a tab will be released when it's closed. Is it reasonable to modify the expectation a bit, say, all network resources for a tab will be released when N seconds passed since it's closed?

annevk · 2018-01-18T08:34:55Z

I think that's reasonable, given that we already embraced that for service workers.

yutakahirano · 2018-01-23T08:22:29Z

@igrigorik @davidben @sleevi What do you think about the idea? Do you think it's good to add timeout sentences to the spec instead of restricting the number of requests? As stated above, web developers can escape the restriction on the number of requests if they are determined, so timeout is a simpler and no less solid solution, I think.

sleevi · 2018-01-23T09:47:55Z

@yutakahirano I don’t think that resolves the interoperability concern. I do not believe a timeout would be sufficient in Chrome, for example - we would and should, as a matter of both security (Browser DoS) and stability (aforementioned reasons) - impose limits on how much we will accept as pending. We may not choose to specify it here, but it will still be web observable, so I don’t think a timeout solution helps address the fundamental interoperability concerns being raised here.

I do think a timeout helps somewhat with predictability though, and I think it is a good additional check, and I think helps set some bound on user expectations being violated - and I think those are all good things.

igrigorik · 2018-01-23T16:18:42Z

@sleevi @yutakahirano with your Chrome hat on, what values would you enforce (or propose to) today both for timeout and pending?

sleevi · 2018-01-23T16:41:34Z

@igrigorik

So, with timeout, the primary concern I think is aligning with user expectations. The expectation that, by navigating away from a page (or closing a page), any resources or affects had are terminated, within some defined bounds. Users conditioned to close tabs to make browsing faster (by freeing resources) are an example of that. So I think the discussion of timeout is largely one of how predictable it is for users, and so I suspect values there will be somewhat subjective, but let's throw out 15-30 seconds as a possible anchor for discussion (which ignores things like size of data versus user's bandwidth, which could be different for mobile versus desktop).

For pending, I think it's equal parts implementation concern and user expectations, which makes it trickier. Let's presume, for sake of discussion, a Chrome-like multi-process model, and an implementation in which keepalive (and sendBeacon) fully live within the renderer process. In this implementation approach, pending is not really an implementation concern per-se (as the renderer's memory will be bounded, and eventually in an OOM situation it'll be first to die - much like attempting to load an overly large SVG file). It is, however, a user-expectation concern, since if renderer process cannot be killed until after processing keepAlive/sendBeacons, then zombie renderers (or service workers) can create negative user experiences - which is why timeout exists to reap them. However, if an implementation strategy chose to instead place keepalive and sendBeacon in a browser process, and allow the renderer to shut down, then pending is much more important - the ability for content to keep browser-process memory pegged becomes much more problematic, and it's an implementation concern needing to bound the ability for 'hostile' content to affect other navigations/interactions and the overall stability. Further, the amount you can dedicate to this sort of process will itself be bounded by the platform - 512KB might seem a drop in the bucket on a 2GB desktop user, but be unconscionably large on a 512MB device like an Android phone or Apple watch.

I realize this is a mixture of constituencies - user expectations like David highlighted on blink-dev@ and implementation concerns - and developers fit right in the middle of those two, but it may help explain some of the tension.

yutakahirano · 2018-01-24T07:19:53Z

@igrigorik We (Chrome) already have 30-sec timeout. It's implemented in multiple places for some reasons, but see content::KeepAliveHandleFactory for example. We also have a restriction on the number of requests in a way in content::ResourceDispatcherHostImpl::HasSufficientResourcesForRequest.

As for the pending point in @sleevi's comment, it sounds that it is important regardless of keepalive flag. Chrome already has some protection. I'm open to have such protection in the spec, and change Chrome's implementation if needed, but I don't think it's keepalive specific. Does this make sense?

sleevi · 2018-01-24T13:38:16Z

@yutakahirano I'm not sure I understand why you suggest it's not keepalive specific. Can you think of another place in which an origin can continue to consume resources on the user's device, despite their being no navigation context to that domain, and without any ability for the user agent to intervene on the user's behalf?

annevk · 2018-01-24T13:44:18Z

@sleevi the user agent can always intervene, but service workers is a feature that meets those characteristics as mentioned upthread.

sleevi · 2018-01-24T13:59:30Z

@annevk Except Service Workers have an implementation defined escape hatch (as noted in w3c/ServiceWorker#527 ), whereas this provides no such implementation escape hatch that I can see. The original suggestion on blink-dev had been to ensure such escape hatches existed in keepalive/sendBeacon, but then it seems from #662 (comment) onwards the focus became on an interoperable limit, rather than an implementation-defined escape hatch for interventions.

annevk · 2018-01-24T14:02:30Z

I see. I think I'd be okay with leaving this implementation-defined for now if that helps (not sure if this is blocking on me at this point). I prefer interoperable limits, interoperable OOM even, but you can't have it all.

igrigorik · 2018-01-25T15:09:11Z

@sleevi thanks, I think we're on the same page with respect to motivation + tension.

@igrigorik We (Chrome) already have 30-sec timeout. It's implemented in multiple places for some reasons, but see content::KeepAliveHandleFactory for example. We also have a restriction on the number of requests in a way in content::ResourceDispatcherHostImpl::HasSufficientResourcesForRequest.

@sleevi given the above, what's the implementation delta (if any) between what you're asking for and what's implemented in Chrome. Or, is the remaining AI here on spec side to document these mechanisms?

sleevi · 2018-01-25T15:20:00Z

On Thu, Jan 25, 2018 at 10:10 AM Ilya Grigorik ***@***.***> wrote: @sleevi <https://github.com/sleevi> thanks, I think we're on the same page with respect to motivation + tension. @igrigorik <https://github.com/igrigorik> We (Chrome) already have 30-sec timeout. It's implemented in multiple places for some reasons, but see content::KeepAliveHandleFactory for example. We also have a restriction on the number of requests in a way in content::ResourceDispatcherHostImpl::HasSufficientResourcesForRequest. @sleevi <https://github.com/sleevi> given the above, what's the implementation delta (if any) between what you're asking for and what's implemented in Chrome. Or, is the remaining AI here on spec side to document these mechanisms?

Fairly large, based on the limited data shared so far. Those limits do not address the concerns raised. We are working to better understand how this feature was implemented in Chrome, given the apparent lack of mitigations for both starvation and DoS. —

…

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#662 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABayJ2sicFah3D9rl2tIBeGFJVGmjUkfks5tOJkZgaJpZM4ReaZf> .

yutakahirano · 2018-02-06T01:32:19Z

Closing this bug as we decided to have a Chrome-specific limit.

annevk · 2018-02-16T11:34:50Z

At the very least the specification needs to be changed to allow for that, no?

yutakahirano · 2018-02-18T23:24:21Z

I think cancelling requests when UA doesn't have sufficient resources is already allowed, right?

annevk · 2018-02-19T08:45:41Z

OOM sure, but this is canceling for tracking purposes. I think that's worth calling out on its own so implementers consider it.

sleevi · 2018-02-19T14:52:46Z

Can you indicate what you mean by canceling for tracking purposes? The proposal here is to bound resources and cancel when those resources are exhausted.

…

On Mon, Feb 19, 2018 at 3:45 AM Anne van Kesteren ***@***.***> wrote: OOM sure, but this is canceling for tracking purposes. I think that's worth calling out on its own so implementers consider it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#662 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABayJ8IKbcU_nDsrx-utf4xBNZuOtj-lks5tWTS3gaJpZM4ReaZf> .

annevk · 2018-02-19T14:58:10Z

Sorry, got things mixed up. Maybe this is fine as-is then. Let's close it for now.

annevk added the needs tests Moving the issue forward requires someone to write tests label Jan 15, 2018

yutakahirano closed this as completed Feb 6, 2018

annevk reopened this Feb 16, 2018

annevk closed this as completed Feb 19, 2018

yutakahirano mentioned this issue Mar 6, 2018

Specify restriction for requests with keepalive set #679

Open

j7i mentioned this issue Oct 16, 2024

Tracker Version v4: Pending Fetch API Requests Accumulating in OutQueue on Chrome snowplow/snowplow-javascript-tracker#1355

Closed

keepalive: Do we need to restrict the number of requests at a time? #662

keepalive: Do we need to restrict the number of requests at a time? #662

Comments

yutakahirano commented Jan 15, 2018

annevk commented Jan 15, 2018

yoavweiss commented Jan 15, 2018

annevk commented Jan 15, 2018

igrigorik commented Jan 15, 2018

sleevi commented Jan 15, 2018

annevk commented Jan 15, 2018

sleevi commented Jan 15, 2018 via email

annevk commented Jan 15, 2018

sleevi commented Jan 15, 2018 via email

igrigorik commented Jan 16, 2018

annevk commented Jan 16, 2018

davidben commented Jan 16, 2018

yutakahirano commented Jan 16, 2018 • edited Loading

sleevi commented Jan 16, 2018

yutakahirano commented Jan 18, 2018

annevk commented Jan 18, 2018

yutakahirano commented Jan 18, 2018

annevk commented Jan 18, 2018

yutakahirano commented Jan 23, 2018

sleevi commented Jan 23, 2018

igrigorik commented Jan 23, 2018

sleevi commented Jan 23, 2018 • edited Loading

yutakahirano commented Jan 24, 2018

sleevi commented Jan 24, 2018

annevk commented Jan 24, 2018

sleevi commented Jan 24, 2018

annevk commented Jan 24, 2018

igrigorik commented Jan 25, 2018

sleevi commented Jan 25, 2018 via email

yutakahirano commented Feb 6, 2018

annevk commented Feb 16, 2018

yutakahirano commented Feb 18, 2018

annevk commented Feb 19, 2018

sleevi commented Feb 19, 2018 via email

annevk commented Feb 19, 2018

yutakahirano commented Jan 16, 2018 •

edited

Loading

sleevi commented Jan 23, 2018 •

edited

Loading