Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does a list of available images used when parsing document with multiple img nodes with same src? #2465

Open
LeonidVasilyev opened this issue Mar 24, 2017 · 18 comments

Comments

@LeonidVasilyev
Copy link

Standard states following regarding list of available images:

It is not used to avoid re-downloading the same image while the previous image is still loading.

However, if you open an HTML document that contains multiple img nodes with same src in current versions of Chrome, Firefox or IE 11 you will notice that these browsers make only single network request for that image.

I checked this test case in serveral browsers:

Test case Chrome 56 Firefox 50.0.1 Firefox 31 IE 11
HTML document with multiple sequential img nodes with same src URI Single request Single request Single request Single request

Is this a deviation from standard or a side effect from some behavior specified by HTML standard?

@LeonidVasilyev LeonidVasilyev changed the title How list of available images is used when parsing document with multiple img nodes with same src? How does a list of available images used when parsing document with multiple img nodes with same src? Mar 24, 2017
@jdm
Copy link
Member

jdm commented Mar 24, 2017

Those browsers are presumably relying on the HTTP cache which contains an in-progress response.

@LeonidVasilyev
Copy link
Author

Interestingly, this ain't gonna happen if you simultaneously send multiple GET requests to same URI using XMLHttpRequest. Besides, in my experience, requests that got cached response end up appearing on Network tab in development tools anyway.

@LeonidVasilyev
Copy link
Author

I add Cache-Control: no-store, no-cache to responses. Browser still performs only single request.

@domenic
Copy link
Member

domenic commented Mar 25, 2017

This is specified: https://html.spec.whatwg.org/#the-list-of-available-images

Closing but happy to continue discussing in the closed thread, and reopen if we missed something in the spec.

@domenic domenic closed this as completed Mar 25, 2017
@LeonidVasilyev
Copy link
Author

@domenic isn't that correct that according to standard parsing of following piece of HTML should end up performing two network requests? Given browser sees URI of the src for the first time.

<img src="foo/bar.png" />
<img src="foo/bar.png" />

@annevk
Copy link
Member

annevk commented Mar 26, 2017

@LeonidVasilyev it's not correct, for images in particular, due to the map @domenic referenced.

@LeonidVasilyev
Copy link
Author

@annevk my reasoning is based on two pieces of HTML standard. 14 step in 4.8.4.3.4 Updating the image data states that image is added to list of available images after it is fetched:

Furthermore, the last task that is queued by the networking task source once the resource has been fetched must additionally run these steps:
...
2. Set image request to the completely available state.
3. Add the image to the list of available images using the key key, with the ignore higher-layer caching flag set.

First note from 4.8.4.3.3 The list of available images states that list of available images is not used to avoid re-downloading image while it's still loading:

It is not used to avoid re-downloading the same image while the previous image is still loading.

In my example when parser sees second img tag list of available images doesn't contan first image because it is still downloading (in general case). If there is no image for second img tag in list of available images browser should perform second request.

Please correct me if I wrong or missed something.

@annevk
Copy link
Member

annevk commented Mar 26, 2017

That's probably an error of sorts, or maybe the difference between Chrome/WebKit's memory cache and this HTML feature.

@annevk annevk reopened this Mar 26, 2017
@domenic
Copy link
Member

domenic commented Mar 26, 2017

I think what happens here is that Chrome decides to not make a second image request to that URL while the first one is in progress. That isn't governed by the spec I guess, and might technically be against spec depending on how you read things.

Then when it comes time to make the second request, it goes through the logic to check the list of available images, and the spec takes over.

I believe @surma was doing some research on this?

@surma
Copy link
Contributor

surma commented Mar 26, 2017

I was doing some research on behavior discrepancies between browsers when it comes to fetch() requests, Worker instantiation and iframes. Images have a somewhat special handling, but I assume similar patterns apply:

If you request resource A, Chrome does indeed block a 2nd request for resource A until the first request is resolved and re-uses the response if the caching headers in the first response allow it. If the headers turn out to disallow reuse, a second request is dispatched to the network.

In the context of fetch, setting {cache: 'no-store'} should make the 2nd request go to the network immediately and not wait for the first request to return, but as of now Chrome doesn’t support the cache option at all.

This behavior differs wildly across browsers – but none of them violate the HTTP spec for caching, they are just suboptimal at times.

Not sure this is necessarily helpful for this discussion – but I see stalling a second request to wait for the first one not as a violation of the HTML spec.

@zcorpan
Copy link
Member

zcorpan commented Mar 27, 2017

Isn't this behavior the same for other things, like fonts, stylesheets, scripts?

The intent as far as the spec for img goes is that the logic for reusing an ongoing fetch is the responsibility of the Fetch spec, and the "list of available images" is layered on top and only populated for completed fetches with decodable images. (There is an open bug about extending it to cover unsuccessful image fetches as well, to avoid retrying over and over.)

@annevk
Copy link
Member

annevk commented Mar 27, 2017

Isn't this behavior the same for other things, like fonts, stylesheets, scripts?

Only WebKit/Chrome have this so-called "memory cache" as I understand it. I've asked some folks to describe it and get it standardized, but not much activity thus far.

@LeonidVasilyev
Copy link
Author

Couple more test cases:

Test case Chrome Firefox IE 11
HTML document with multiple sequential stylesheet link nodes with same href URI Single request Single request Single request
HTML document with multiple sequential script nodes with same src URI Single request One request for each node Single request
HTML document with multiple sequential async script nodes with same src URI Single request One request for each node Single request

Altough Firefox has a flag named browser.cache.memory.enable it doesn't seem to affect browser behavior in described scenarios.

@surma
Copy link
Contributor

surma commented Mar 27, 2017

@LeonidVasilyev: What are the caching headers on those resources? According to my research, they do have an impact on how the browser behaves:
screenshot 2017-03-26 15 01 32

serialize = last request has to finish before next request is kicked off
parallelize = all requests are sent out at the same time and have a unique response
wait 1st + reuse = the response for the first request is reused for all remaining requests

@LeonidVasilyev
Copy link
Author

@surma, I've got same result for Chrome, Firefox and IE 11 with both Cache-Control: no-cache and Cache-Control: max-age=3600 response headers

@surma
Copy link
Contributor

surma commented Mar 27, 2017

Wow, that’s giving me a headache. I’ll take a closer look at this in the coming weeks. Whatever the underlying mechanism, this is not good in terms of developer ergonomics.

@LeonidVasilyev
Copy link
Author

Refresh image with a new one at the same url discussion on StackOverflow contains interesting information about in-memory cache or list of available images behavior. According to Aya's answer you should use both Cache-Control: no-store response HTTP header and cache buster based on random URI fragment in order to prevent image requests from hitting cache.

@processprocess
Copy link

processprocess commented Nov 24, 2020

I'm testing this in Chrome using a local express server.
I'm noticing some interesting behavior.

I have 1000 img elements in an html doc.

  • When src is set with javascript, there is 1 network request.
  • When src is set in html, there are exactly 3 network request every time I refresh.
  • When src is set in html and I put a setTimeout on res.download in the server I get 1 network request.

Weird that a delay on the res.download from the server ensures 1 request.
Realistically a network request wouldn't resolve as fast as a local server and the timeout could simulate a typical network request, so this might be good news for caching. Still I wonder what is going on here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

7 participants