consider exposing size for cache entries #587

wanderview · 2014-12-10T21:51:47Z

(I think this is a known feature request, but didn't see an issue for it.)

Consider how you would build a media application like a music player or photo gallery. While you probably couldn't store all media offline, it would be nice to save some amount of the most frequently used files.

Currently the Cache API lets us build an LRU cache based on count:

var cache;
cache.open('foo').then(function(foo) {
  cache = foo;
  return cache.match(request);
}).then(function(response) {
  if (response) {
    // update order of entries in keys()
    cache.put(request, response.clone());
    return response;
  }

  var maxItems = 100;
  return addToLRU(cache, maxItems, request);
});

function addToLRU(cache, maxItems, request) {
  return cache.keys().then(function(keys) {
    if (keys.length < maxItems) {
      return cache.add(request);
    }

    return cache.delete(keys[0]).then(function() {
      return cache.add(request);
    });
  });
}

It would be nice, however, to be able to store items up until a certain size limit is reached. I was thinking an API like:

  Promise<unsigned long> sizeOf(Request or USVString request, optional QueryParams params);
  Promise<unsigned long> sizeOfAll(Request or USVString request, optional QueryParams params);

These would function like match() and matchAll(), but return a size value instead of the responses.

It would then be necessary to define what "size" means. This I am less sure of:

Size of on-disk or in-memory? I assume on-disk would be preferable.
Size of just the body or including all fields? The body will typically dominate for most Responses.
Require approximate or exact measurements from the browser? Given compression, databases, and de-duplication, I think approximate would be better.

Thoughts?

The text was updated successfully, but these errors were encountered:

gauntface · 2014-12-10T22:24:41Z

The only criticism I have is that this needs to go through the cache to work, my instinct is that I'd get a response, figure out it's size and then check what the current fill of the cache is in terms of file size and make a decision to include or not.

This allows decisions per file basis (i.e. this file is huge, don't ever cache it)
This file is good to cache, do I have enough space?
This file is fresh, is there a big file I should drop from the cache for this response and possibly others?

But this all sounds like it would live on the request rather than the cache API and would shift the onus onto the developer to track memory usage (which is good and bad).

wanderview · 2014-12-10T22:36:35Z

I think the main problem with getting the size on the Request or Response is that you don't really know until the body is drained. Fetch resolves when just the headers are available and the body may still be coming in off the network. Unfortunately content-length is often a lie. Either you need to read the whole stream into memory (bad for mobile) or write it to disk and check.

I think you would need to add to the cache, then delete any excess over the threshold:

fetch(request.clone()).then(function(response) {
  cache.put(request, response.clone()).then(function() {
    deleteExcess(cache, 1024*1024);
  });
  return response;
});

function deleteExcess(cache, maxSize) {
  cache.sizeOfAll('./avatars/', { prefixMatch: true }).then(function(size) {
    if (size < maxSize) {
      return;
    }
    cache.keys().then(function(keys) {
      cache.delete(keys[0]);
      deleteExcess(cache, maxSize);
    });
  });
}

wanderview · 2014-12-10T23:57:19Z

Alternatively, if the origin can ensure its server provides accurate content-length headers, then it can simply use those.

var length = response.headers.get('content-length');

You could then store the total length in IDB.

This would not be exactly the size stored on disk in the cache, though. The actual size would depend on how the Cache implementation dealt with content encoding, additional compression, de-duplication, etc. For example, the gecko cache will (unfortunately) remove the content-encoding and then recompress with snappy.

Anyway, maybe the content-length headers are enough and we don't need a new API here. What do people think?

jakearchibald · 2015-01-14T18:08:21Z

Twitter asked for this too. We should have some kind of method on the cache object that provides an object of meta information, which includes size on disk.

kinu · 2015-01-21T08:12:36Z

(Since we've heard similar requests in Quota API in the past I filed a related issue, we probably have to agree on where these APIs should live on: kinu/quota-api#10)

KenjiBaheux · 2015-02-05T14:45:58Z

One more datapoint: this came up as a question on StackOverflow.

robrbecker · 2015-02-05T16:05:37Z

tl;dr I'd like to see the ability to query not only the size of individual items in the cache, but also the total size of the cache itself. Hopefully this can be done in a performant way, without having to spin through all the items adding up the size. Each cache could keep a tally on the total size of the items it contains and update it as items are added and removed.

Imagine a use case where you might want to take an app offline that deals with displaying large documents. You also want to enable the user to take an entire document offline. The document is comprised of multiple requests, say 1 upfront and 1 per page. When the user clicks to take a document offline, let's say we create a service worker cache for that document, request all the assets and add them to that cache.

Now the user has taken a few documents offline. There is a cache per doc, and one for the app assets. The user may want to see what documents are available offline, how much space each uses on their device, and remove a document from the offline cache. If each cache knew the size of the items it contained, then segmenting data into caches in this fashion makes it easy to query the size of a single document.

In a use case like this, the user would want to see the space actually consumed on disk, including request overhead and taking into account compression, etc.

jakearchibald · 2015-02-09T10:33:09Z

@robrbecker your tl;dr is longer than the OP! But the extra detail is really valuable, thanks for the use-case and clarification.

the user would want to see the space actually consumed on disk, including request overhead and taking into account compression, etc

The UA may dedupe across caches to save disk space. I don't think this is a huge issue, but we should nod to it in the spec.

not only the size of individual items in the cache

In fact, the cache size is going to be more accurate than the content-size header which may be absent or completely false. An accurate way would be response.arrayBuffer().then(a => a.length), but that would be pretty horrific for performance.

@robrbecker what's more important, the size of individual items or the size of the whole cache?

@wanderview would it be simpler to have a single cache.size() promise-returning method? If we need a way to query the size of multiple cache entries maybe we should change the resolve-type of .matchAll to be something that extends Array, so it would be cache.matchAll(request).then(rs => rs.totalSize()) or something.

matt-cook · 2015-04-28T16:43:58Z

While you probably couldn't store all media offline, it would be nice to save some amount of the most frequently used files.

Why not store all media offline?

Similar to @robrbecker, specifics on amount of space used.. and more importantly: amount available is very important to our use case: caching large binary assets (video, images, audio) for physically installed media (digital signage,kiosks, interactive video walls, etc.) that load via web.. but may have intermittent connection.

Specs. regarding clearing of data from cache also critical. Need to understand exactly how much, and how long data will be cached before we can fully rely on it (100% offline-enabled app).

wanderview · 2015-04-28T16:48:18Z

Why not store all media offline?

My 2 TB dropbox account will probably not fit on my mobile device for some years... That was the case I was referring to.

Similar to @robrbecker, specifics on amount of space used.. and more importantly: amount available is very important to our use case: caching large binary assets (video, images, audio) for physically installed media (digital signage,kiosks, interactive video walls, etc.) that load via web.. but may have intermittent connection.

Specs. regarding clearing of data from cache also critical. Need to understand exactly how much, and how long data will be cached before we can fully rely on it (100% offline-enabled app).

Yes, these cases are important too. I believe the current leading proposal for handling guaranteed persistent storage is in here:

https://wiki.whatwg.org/wiki/Storage

It sounds like the v1 bits there would help with your use case.

jakearchibald · 2015-10-28T08:14:55Z

I'm keen on looking at this stuff for v2. We need to make sure that by exposing size we don't hint at content of opaque responses.

robrbecker · 2015-10-28T16:15:38Z

@jakearchibald Finally rounding back on this issue... I think total size of a cache is more important than individual size. That may be a way to get around security concerns of giving out the exact size of each response. (Except in the degenerate case of 1 to 1 cache <-> response)

jakearchibald · 2015-10-28T16:17:59Z

That's what I'm worried about. Actively thinking about this.

wanderview · 2015-10-28T16:57:55Z

This same problem exists for exposing size estimates even on the origin with storage spec. I guess we could just exclude opaque bodies in the script exposed size estimates, but that is kind of annoying to implement and reduces the utility here.

petkaantonov · 2015-12-15T14:43:36Z

The total cache size of opaque items could be ceiled up to closest e.g. 100k bytes so it would be useful for total size use case but not hinting too much of the actual size of an opaque response in the 1 to 1 cache <-> response case.

kinu mentioned this issue Jan 21, 2015

Consider adding API to estimate how much space is taking up for various storage artifacts kinu/quota-api#10

Closed

wanderview mentioned this issue Jan 30, 2015

Hints in beforeevicted to let the SW of a web app pick among several house cleaning options? #611

Open

wanderview added cache needs spec question labels Feb 7, 2015

jakearchibald added this to the Version 2 milestone Oct 28, 2015

jakearchibald mentioned this issue Nov 3, 2017

Create F2F agenda - 7 November 2017 #1206

Open

tomayac mentioned this issue Dec 5, 2017

Allow caches to opt-in to granular cleanup #863

Open

jatindersmann mentioned this issue Apr 23, 2018

Create F2F agenda - 25 October 2018 #1303

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consider exposing size for cache entries #587

consider exposing size for cache entries #587

wanderview commented Dec 10, 2014

gauntface commented Dec 10, 2014

wanderview commented Dec 10, 2014

wanderview commented Dec 10, 2014

jakearchibald commented Jan 14, 2015

kinu commented Jan 21, 2015

KenjiBaheux commented Feb 5, 2015

robrbecker commented Feb 5, 2015

jakearchibald commented Feb 9, 2015

matt-cook commented Apr 28, 2015

wanderview commented Apr 28, 2015

jakearchibald commented Oct 28, 2015

robrbecker commented Oct 28, 2015

jakearchibald commented Oct 28, 2015

wanderview commented Oct 28, 2015

petkaantonov commented Dec 15, 2015

consider exposing size for cache entries #587

consider exposing size for cache entries #587

Comments

wanderview commented Dec 10, 2014

gauntface commented Dec 10, 2014

wanderview commented Dec 10, 2014

wanderview commented Dec 10, 2014

jakearchibald commented Jan 14, 2015

kinu commented Jan 21, 2015

KenjiBaheux commented Feb 5, 2015

robrbecker commented Feb 5, 2015

jakearchibald commented Feb 9, 2015

matt-cook commented Apr 28, 2015

wanderview commented Apr 28, 2015

jakearchibald commented Oct 28, 2015

robrbecker commented Oct 28, 2015

jakearchibald commented Oct 28, 2015

wanderview commented Oct 28, 2015

petkaantonov commented Dec 15, 2015