-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consider exposing size for cache entries #587
Comments
The only criticism I have is that this needs to go through the cache to work, my instinct is that I'd get a response, figure out it's size and then check what the current fill of the cache is in terms of file size and make a decision to include or not.
But this all sounds like it would live on the request rather than the cache API and would shift the onus onto the developer to track memory usage (which is good and bad). |
I think the main problem with getting the size on the Request or Response is that you don't really know until the body is drained. Fetch resolves when just the headers are available and the body may still be coming in off the network. Unfortunately content-length is often a lie. Either you need to read the whole stream into memory (bad for mobile) or write it to disk and check. I think you would need to add to the cache, then delete any excess over the threshold:
|
Alternatively, if the origin can ensure its server provides accurate content-length headers, then it can simply use those.
You could then store the total length in IDB. This would not be exactly the size stored on disk in the cache, though. The actual size would depend on how the Cache implementation dealt with content encoding, additional compression, de-duplication, etc. For example, the gecko cache will (unfortunately) remove the content-encoding and then recompress with snappy. Anyway, maybe the content-length headers are enough and we don't need a new API here. What do people think? |
Twitter asked for this too. We should have some kind of method on the cache object that provides an object of meta information, which includes size on disk. |
(Since we've heard similar requests in Quota API in the past I filed a related issue, we probably have to agree on where these APIs should live on: kinu/quota-api#10) |
One more datapoint: this came up as a question on StackOverflow. |
tl;dr I'd like to see the ability to query not only the size of individual items in the cache, but also the total size of the cache itself. Hopefully this can be done in a performant way, without having to spin through all the items adding up the size. Each cache could keep a tally on the total size of the items it contains and update it as items are added and removed. Imagine a use case where you might want to take an app offline that deals with displaying large documents. You also want to enable the user to take an entire document offline. The document is comprised of multiple requests, say 1 upfront and 1 per page. When the user clicks to take a document offline, let's say we create a service worker cache for that document, request all the assets and add them to that cache. Now the user has taken a few documents offline. There is a cache per doc, and one for the app assets. The user may want to see what documents are available offline, how much space each uses on their device, and remove a document from the offline cache. If each cache knew the size of the items it contained, then segmenting data into caches in this fashion makes it easy to query the size of a single document. In a use case like this, the user would want to see the space actually consumed on disk, including request overhead and taking into account compression, etc. |
@robrbecker your tl;dr is longer than the OP! But the extra detail is really valuable, thanks for the use-case and clarification.
The UA may dedupe across caches to save disk space. I don't think this is a huge issue, but we should nod to it in the spec.
In fact, the cache size is going to be more accurate than the @robrbecker what's more important, the size of individual items or the size of the whole cache? @wanderview would it be simpler to have a single |
Why not store all media offline? Similar to @robrbecker, specifics on amount of space used.. and more importantly: amount available is very important to our use case: caching large binary assets (video, images, audio) for physically installed media (digital signage,kiosks, interactive video walls, etc.) that load via web.. but may have intermittent connection. Specs. regarding clearing of data from cache also critical. Need to understand exactly how much, and how long data will be cached before we can fully rely on it (100% offline-enabled app). |
My 2 TB dropbox account will probably not fit on my mobile device for some years... That was the case I was referring to.
Yes, these cases are important too. I believe the current leading proposal for handling guaranteed persistent storage is in here: https://wiki.whatwg.org/wiki/Storage It sounds like the v1 bits there would help with your use case. |
I'm keen on looking at this stuff for v2. We need to make sure that by exposing size we don't hint at content of opaque responses. |
@jakearchibald Finally rounding back on this issue... I think total size of a cache is more important than individual size. That may be a way to get around security concerns of giving out the exact size of each response. (Except in the degenerate case of 1 to 1 cache <-> response) |
That's what I'm worried about. Actively thinking about this. |
This same problem exists for exposing size estimates even on the origin with storage spec. I guess we could just exclude opaque bodies in the script exposed size estimates, but that is kind of annoying to implement and reduces the utility here. |
The total cache size of opaque items could be ceiled up to closest e.g. 100k bytes so it would be useful for total size use case but not hinting too much of the actual size of an opaque response in the 1 to 1 cache <-> response case. |
(I think this is a known feature request, but didn't see an issue for it.)
Consider how you would build a media application like a music player or photo gallery. While you probably couldn't store all media offline, it would be nice to save some amount of the most frequently used files.
Currently the Cache API lets us build an LRU cache based on count:
It would be nice, however, to be able to store items up until a certain size limit is reached. I was thinking an API like:
These would function like match() and matchAll(), but return a size value instead of the responses.
It would then be necessary to define what "size" means. This I am less sure of:
Thoughts?
The text was updated successfully, but these errors were encountered: