Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keepalive: Do we need to restrict the number of requests at a time? #662

Closed
yutakahirano opened this issue Jan 15, 2018 · 35 comments
Closed
Labels
needs tests Moving the issue forward requires someone to write tests

Comments

@yutakahirano
Copy link
Member

We are planning to implement and ship the keepalive flag. At the intent-to-ship thread, some people want to restrict the number of requests at a time, as well as the total number of inflight bytes. Do you think it's reasonable?

@yoavweiss @annevk @igrigorik

@annevk
Copy link
Member

annevk commented Jan 15, 2018

We could I suppose. Did sendBeacon() originally have that?

Note that we already have a restriction on the total number of bytes.

@annevk annevk added the needs tests Moving the issue forward requires someone to write tests label Jan 15, 2018
@yoavweiss
Copy link
Collaborator

sendBeacon() doesn't have a limitation on the number of requests in spec (nor in implementations as far as I know).

@annevk
Copy link
Member

annevk commented Jan 15, 2018

I wonder if @davidben has suggestions for what the request limit should be.

I suppose we can also somewhat more aggressively send H2 RST_STREAM/H1 close connection once there is a response?

@igrigorik
Copy link
Member

We are planning to implement and ship the keepalive flag. At the intent-to-ship thread, some people want to restrict the number of requests at a time, as well as the total number of inflight bytes.

To clarify, we already spec 64KB limit for inflight bytes.

sendBeacon() doesn't have a limitation on the number of requests in spec (nor in implementations as far as I know).

Indeed. Nor, afaik, has this ever come up as a problem based on current sendBeacon deployments. FWIW, while it's tempting to be really aggressive here, it's also important to recognize that if we bend this too far, developers will simply revert to their existing tricks — sync XHRs, spinning in while loops in click handlers, and so on.

@yutakahirano @davidben what's the motivation for limiting number of requests, in addition to existing restrictions?

@sleevi
Copy link

sleevi commented Jan 15, 2018

@igrigorik I chimed in on the blink-dev@ thread, but one obvious implication is that these connections can keep resources consumed that the user cannot otherwise free (short of restarting the browser), and is not attributable to the site (instead, it just looks like memory is being leaked)

Much in the way we limit the number of connections per origin (6) or total number of sockets (256), there's going to be an inherent limit. One of the risks/implications for sendBeacon() is that a slow sendBeacon() that doesn't time-out can otherwise interfere with the user's ability to browse, by keeping a connection active and engaged. Closing the renderer no longer serves as a way to free those connections (by freeing the associated request contexts, which are terminated at the end of the renderer lifecycle)

@annevk
Copy link
Member

annevk commented Jan 15, 2018

To be clear, and as I tried to say in the thread, I'm fine with adding a limit. I was just raising the point that you might not have enforced one thus far. Is 64 requests reasonable? 16? Something that's not a power of 2?

@sleevi
Copy link

sleevi commented Jan 15, 2018 via email

@annevk
Copy link
Member

annevk commented Jan 15, 2018

Note that they can stuff information in headers and URLs too, so you might be looking at much more than 1 mebibyte.

@sleevi
Copy link

sleevi commented Jan 15, 2018 via email

@igrigorik
Copy link
Member

Ryan, thanks for the additional context — makes sense.

Right, I think the non-enforcement was an oversight we should try to
correct. In the best/worst case, we would limit to 256 parallel requests -
but I’m not sure if we’d limit total pending.

I think the limit may vary on device, and thus be implementation defined
with advisory (SHOULD) guidance

Seems like a good addition. My hunch is that we could get away with a lower limit. On that note, as an option, we could consider a higher hard ('must') limit, with a "UA's may lower this limit in appropriate circumstance" or some such?

@annevk
Copy link
Member

annevk commented Jan 16, 2018

I'd prefer to have a per-document/worker lower bound (or maybe per origin is better), if possible, so developers know what they can normally rely on.

@davidben
Copy link
Collaborator

Users don't observe workers, just tabs or sites, and documents may spawn lots of workers, so that suggests not giving all workers a separate limit.

@yutakahirano
Copy link
Member Author

yutakahirano commented Jan 16, 2018

SharedWorker and ServiceWorker can be associated with multiple documents so they'd be tricky. By the way, ServiceWorker is much stronger than fetch API + keepalive. Do we have any similar limitations on it?

@sleevi
Copy link

sleevi commented Jan 16, 2018

Part of it may be implementation specific, but the memory overhead of ServiceWorker is partially bounded by the out-of-process nature. I also thought we handled ServiceWorkers' (implementation-defined) limits in w3c/ServiceWorker#527 ?

@yutakahirano
Copy link
Member Author

Per-document/worker limit is avoidable as @davidben says. per-origin limit is avoidable by creating many third-party iframes (www1.example.com/, www2.example.com/, ...). per-tab limit may be close to user's expectation but it's not working well with SharedWorker / ServiceWorker. I can think of two policies sounding somewhat sane to me.

  1. Solely rely on timeout. When a user closes a tab, all network resources for the tab will be released after N seconds. There's no limit for the number of inflight keepalive requests.
  2. Have a per-tab limit (i.e., M keepalive requests per tab including iframes / dedicated workers). keepalive requests are not supported in ServiceWorker and SharedWorker.

What do you think?

@annevk
Copy link
Member

annevk commented Jan 18, 2018

By per-tab, do you actually mean a unit of related browsing contexts? Otherwise window.open() / <a target=x> would be escapes (albeit somewhat guarded in most situations). It doesn't seem acceptable to not support them in service workers though. How would you pass such a request through otherwise?

@yutakahirano
Copy link
Member Author

Then what do you think about timeout? Users expect all network resources for a tab will be released when it's closed. Is it reasonable to modify the expectation a bit, say, all network resources for a tab will be released when N seconds passed since it's closed?

@annevk
Copy link
Member

annevk commented Jan 18, 2018

I think that's reasonable, given that we already embraced that for service workers.

@yutakahirano
Copy link
Member Author

@igrigorik @davidben @sleevi What do you think about the idea? Do you think it's good to add timeout sentences to the spec instead of restricting the number of requests? As stated above, web developers can escape the restriction on the number of requests if they are determined, so timeout is a simpler and no less solid solution, I think.

@sleevi
Copy link

sleevi commented Jan 23, 2018

@yutakahirano I don’t think that resolves the interoperability concern. I do not believe a timeout would be sufficient in Chrome, for example - we would and should, as a matter of both security (Browser DoS) and stability (aforementioned reasons) - impose limits on how much we will accept as pending. We may not choose to specify it here, but it will still be web observable, so I don’t think a timeout solution helps address the fundamental interoperability concerns being raised here.

I do think a timeout helps somewhat with predictability though, and I think it is a good additional check, and I think helps set some bound on user expectations being violated - and I think those are all good things.

@igrigorik
Copy link
Member

@sleevi @yutakahirano with your Chrome hat on, what values would you enforce (or propose to) today both for timeout and pending?

@sleevi
Copy link

sleevi commented Jan 23, 2018

@igrigorik

So, with timeout, the primary concern I think is aligning with user expectations. The expectation that, by navigating away from a page (or closing a page), any resources or affects had are terminated, within some defined bounds. Users conditioned to close tabs to make browsing faster (by freeing resources) are an example of that. So I think the discussion of timeout is largely one of how predictable it is for users, and so I suspect values there will be somewhat subjective, but let's throw out 15-30 seconds as a possible anchor for discussion (which ignores things like size of data versus user's bandwidth, which could be different for mobile versus desktop).

For pending, I think it's equal parts implementation concern and user expectations, which makes it trickier. Let's presume, for sake of discussion, a Chrome-like multi-process model, and an implementation in which keepalive (and sendBeacon) fully live within the renderer process. In this implementation approach, pending is not really an implementation concern per-se (as the renderer's memory will be bounded, and eventually in an OOM situation it'll be first to die - much like attempting to load an overly large SVG file). It is, however, a user-expectation concern, since if renderer process cannot be killed until after processing keepAlive/sendBeacons, then zombie renderers (or service workers) can create negative user experiences - which is why timeout exists to reap them. However, if an implementation strategy chose to instead place keepalive and sendBeacon in a browser process, and allow the renderer to shut down, then pending is much more important - the ability for content to keep browser-process memory pegged becomes much more problematic, and it's an implementation concern needing to bound the ability for 'hostile' content to affect other navigations/interactions and the overall stability. Further, the amount you can dedicate to this sort of process will itself be bounded by the platform - 512KB might seem a drop in the bucket on a 2GB desktop user, but be unconscionably large on a 512MB device like an Android phone or Apple watch.

I realize this is a mixture of constituencies - user expectations like David highlighted on blink-dev@ and implementation concerns - and developers fit right in the middle of those two, but it may help explain some of the tension.

@yutakahirano
Copy link
Member Author

@igrigorik We (Chrome) already have 30-sec timeout. It's implemented in multiple places for some reasons, but see content::KeepAliveHandleFactory for example. We also have a restriction on the number of requests in a way in content::ResourceDispatcherHostImpl::HasSufficientResourcesForRequest.

As for the pending point in @sleevi's comment, it sounds that it is important regardless of keepalive flag. Chrome already has some protection. I'm open to have such protection in the spec, and change Chrome's implementation if needed, but I don't think it's keepalive specific. Does this make sense?

@sleevi
Copy link

sleevi commented Jan 24, 2018

@yutakahirano I'm not sure I understand why you suggest it's not keepalive specific. Can you think of another place in which an origin can continue to consume resources on the user's device, despite their being no navigation context to that domain, and without any ability for the user agent to intervene on the user's behalf?

@annevk
Copy link
Member

annevk commented Jan 24, 2018

@sleevi the user agent can always intervene, but service workers is a feature that meets those characteristics as mentioned upthread.

@sleevi
Copy link

sleevi commented Jan 24, 2018

@annevk Except Service Workers have an implementation defined escape hatch (as noted in w3c/ServiceWorker#527 ), whereas this provides no such implementation escape hatch that I can see. The original suggestion on blink-dev had been to ensure such escape hatches existed in keepalive/sendBeacon, but then it seems from #662 (comment) onwards the focus became on an interoperable limit, rather than an implementation-defined escape hatch for interventions.

@annevk
Copy link
Member

annevk commented Jan 24, 2018

I see. I think I'd be okay with leaving this implementation-defined for now if that helps (not sure if this is blocking on me at this point). I prefer interoperable limits, interoperable OOM even, but you can't have it all.

@igrigorik
Copy link
Member

@sleevi thanks, I think we're on the same page with respect to motivation + tension.

@igrigorik We (Chrome) already have 30-sec timeout. It's implemented in multiple places for some reasons, but see content::KeepAliveHandleFactory for example. We also have a restriction on the number of requests in a way in content::ResourceDispatcherHostImpl::HasSufficientResourcesForRequest.

@sleevi given the above, what's the implementation delta (if any) between what you're asking for and what's implemented in Chrome. Or, is the remaining AI here on spec side to document these mechanisms?

@sleevi
Copy link

sleevi commented Jan 25, 2018 via email

@yutakahirano
Copy link
Member Author

Closing this bug as we decided to have a Chrome-specific limit.

@annevk
Copy link
Member

annevk commented Feb 16, 2018

At the very least the specification needs to be changed to allow for that, no?

@annevk annevk reopened this Feb 16, 2018
@yutakahirano
Copy link
Member Author

I think cancelling requests when UA doesn't have sufficient resources is already allowed, right?

@annevk
Copy link
Member

annevk commented Feb 19, 2018

OOM sure, but this is canceling for tracking purposes. I think that's worth calling out on its own so implementers consider it.

@sleevi
Copy link

sleevi commented Feb 19, 2018 via email

@annevk
Copy link
Member

annevk commented Feb 19, 2018

Sorry, got things mixed up. Maybe this is fine as-is then. Let's close it for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs tests Moving the issue forward requires someone to write tests
Development

No branches or pull requests

6 participants