Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive Storage Utilisation - Tiles are cached at duplicate locations, never get pruned #2298

Closed
westnordost opened this issue Oct 28, 2021 · 12 comments

Comments

@westnordost
Copy link
Contributor

westnordost commented Oct 28, 2021

Spawned from streetcomplete/StreetComplete#3417 .

In a nutshell

It looks like the map tiles are cached both at the location the user of this library specified and at another location at the same time. The issues are:

  • that the tiles get cached at that other location at all, this duplicates the cache space requirement
  • that the tiles in that other location seemingly never get pruned

My cache configuration

// = sdcard/Android/data/de.westnordost.streetcomplete/cache/tile_cache
val cacheDir = File(context.externalCacheDir, "tile_cache")

val cache = Cache(cacheDir , 50 * 1000L * 1000L) // prune after reaching 50 MB

val cacheControl = CacheControl.Builder()
    .maxAge(12, TimeUnit.HOURS) // do not re-download tiles younger than 12 hours
    .maxStale(14, TimeUnit.DAYS) // prune tiles older than 14 days
    .build()

val httpHandler = object : DefaultHttpHandler(OkHttpClient.Builder().cache(cache)) {
    override fun configureRequest(url: HttpUrl, builder: Request.Builder) {
        builder.cacheControl(cacheControl)
    }
}

tangramMapView.initMap(httpHandler )

The OkHttpCache actually prunes tiles in sdcard/Android/data/de.westnordost.streetcomplete/cache/tile_cache correctly when this directory exceeds a size of 50MB. I tested this.

What is wrong? Observations

  1. Reported cache size:
    image

  2. Actual size of external cache directory:
    image

  3. There are no other cache files in neither of

    • /sdcard/Android/data/de.westnordost.streetcomplete/cache/
    • /data/data/de.westnordost.streetcomplete/cache/
  4. When panning the map and thus downloading new tiles, the reported cache size grows at pretty much at 2x the rate as the size of the external cache directory

  5. When comparing the total free storage space on the phone before and after clearing the cache for the app, it is clear that the reported cache size by Android is not a display error: In this case, indeed 57 MB storage space has been freed.
    However, the size of the directories mentioned in point 3 were reduced by only 22MB.

Thus, I conclude that there must be a third directory somewhere where the tiles are cached and that is never pruned that is outside of the directories mentioned in point 3 and used by tangram-es to store duplicates of the downloaded tiles.
Android is able to attribute this directory to the app and able to clear it but I have not found where on the sdcard it is supposed to be.

Used versions

  • tangram-es 0.16.2
  • Android (different versions, different devices)
@matteblair
Copy link
Member

Interesting report! I did a brief test in the Tangram ES Android demo app to see if I could observe this. The demo app uses a similar caching configuration to yours, but limited to 16MB (https://github.com/tangrams/tangram-es/blob/main/platforms/android/demo/src/main/java/com/mapzen/tangram/android/MainActivity.java#L209-L225). What I observed is that the "Cache" amount in the Application manager corresponds almost exactly to the size of the /sdcard/Android/data/com.mapzen.tangram.android/cache/ folder on my device, as reported by du.

Just to make sure that we're measuring the same things, can you check the size of your tile cache directory by using du on your device via adb shell? The command should be something like:

adb shell du -sh /sdcard/Android/data/de.westnordost.streetcomplete/cache/

Check the output of that and tell me whether it still differs from the "Cache" amount reported by Android.

@westnordost
Copy link
Contributor Author

Okay, this is curious.

adb shell du -sh /sdcard/Android/data/de.westnordost.streetcomplete/cache/ returns 56 MB
adb shell ls /sdcard/Android/data/de.westnordost.streetcomplete/cache| wc -l returns 5213

Copying over all files in that directory yields a directory whose size is 23 MB. The number of files contained is 5213.
Copying back that directory onto the phone yields a directory whose size is again 56 MB.

So, it looks like this is just the size on disk and the OkHttpCache counts instead the logical size. So, this explains the 2x rate of growing of the cache.

This also explains why another user who set his cache size limit to 250MB reported in streetcomplete/StreetComplete#3417 that the cache size was reported to be about 500MB.


The output of adb shell mount confused me, either the data is mounted as ext4 or sdcardfs. ext4 has a block size of 1KB, sdcardfs I don't know.

Only one third of the cache files are below 2KB, so I find it really surprising that the size on disk seems to be +100% the size. On the NTFS file system, it is +50% size.

Anyway, I think this issue can then be closed, all has been answered. Thank you for your time!

@westnordost
Copy link
Contributor Author

In any case, this is a finding that may be useful for providers of map tiles: Maybe rather have fewer large(r) map tiles than many very small ones.

@mnalis
Copy link

mnalis commented Oct 29, 2021

For anybody interesting in comparing results themselves, first command gives how much real space is used (number of blocks * block size), and second command gives how much is the sum of filesizes. This example is on my android 10 with 512 bytes sector size.


% adb shell '(find /sdcard/Android/data/de.westnordost.streetcomplete/cache  -execdir stat -c "%b*%B+\\" {} \; ; echo 0) | bc'
6671872

% adb shell '(find /sdcard/Android/data/de.westnordost.streetcomplete/cache -type f -execdir stat -c "%s+\\" {} \; ; echo 0) | bc'
4305549

Also, stat -f /mountpoint should reveal what is block size on that filesystem

@smichel17
Copy link

FYI @Joxit @ianthetechie (anyone else I missed who might appreciate a heads-up?)

@matteblair
Copy link
Member

Good information to know! This makes me think that it would be possible to implement a more space-efficient cache by combining tile responses into a single file - like a sqlite database. I don't expect to do this in Tangram ES but it might be possible for a client application.

@Joxit
Copy link

Joxit commented Oct 29, 2021

Hi there, yes file system is not adapted to tile storing, you should use SQLite based on the MBTiles specification

@matteblair
Copy link
Member

@Joxit indeed MBTiles is what I had in mind when I mentioned an sqlite database. It's a tempting idea, but I think it would be pretty complicated for a couple reasons:

  1. In addition to storing tile data, we also need to respect cache control headers in the client request and in the server response. Currently this is handled by OkHTTP. Implementing this again would be non-trivial work and would probably not be as good as OkHTTP's implementation!
  2. Tangram ES doesn't just request tiles, it can also request scene files, scene bundles, or image resources over HTTP. These other resources don't fit in the MBTiles schema, so we would need multiple caches for different types of resources.

This doesn't mean it can't be done! Tangram ES allows a client application to replace the entire HTTP implementation, so you could certainly create a tile cache based on MBTiles if you wanted to.

@westnordost
Copy link
Contributor Author

westnordost commented Oct 30, 2021

Hmm...

  1. To create a tile cache based on MBTiles just so to store the cache more efficiently seems something no user of a map render library would do. So it seems this is something that should be done by the render library.
  2. However, for web-maps (tangram, maplibre-gl-js), the render library obviously does not manage the cache by itself but the browser does that. So why should the library on native manage the cache itself and on the web, not?

I am not sure if browsers put all the cached files in some kind of giant indexed table so they don't have this problem. I'd also be curious how MapLibre handles this. After all, MapLibre is pretty much the reference implementation that all the map tile providers cater for.

@smichel17
Copy link

Seems like the ideal setup would be an independent library which handles this. That way it could be shared between renderers, and only included when needed.

@tallytalwar
Copy link
Member

Last I remember we did add some code to do offline Rendering via mbtiles. If needed we should be able to use that tile source class generically for all tile source cache?

@matteblair
Copy link
Member

Yep, there is code in Tangram ES currently for caching vector tile sources in a local mbtiles database. However in its current form it isn't a substitute for HTTP caching. The existing mbtiles cache will naively store all tiles forever, regardless of the cache-control headers that accompanied the tile response.

I looked into maplibre to see what approach is used there. It does seem to use a local sqlite database for tile caching, complete with parsing and logic for response headers. The maplibre caching code is in the shared native library, for reuse across platforms.

A similar approach would make sense for Tangram ES. Tile caching could be implemented in the shared core library using sqlite, with options like the maximum cache size exposed to the client SDKs. I would support this approach, I think it could lead to a better user experience for Tangram ES. But the scope of this work is beyond my current availability (for the time being, at least).

It would be nice if the caching code from maplibre could be shared in an independent library, but unfortunately it seems very tightly coupled with the internals of their renderer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants