Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache attributes #77

Merged
merged 156 commits into from
Jun 27, 2023
Merged

Cache attributes #77

merged 156 commits into from
Jun 27, 2023

Conversation

axtimwalde
Copy link
Collaborator

Filesystem abstraction and meta data caching

axtimwalde and others added 30 commits April 12, 2021 22:28
Avoids frequent reading and parsing of JSON encoded attributes,
this is most interesting for high latency backends. Changes of
attributes by an independent writer will not be tracked, so some
consideration required.
writer are consistent in cache and on disc
sensitive ops twice with a caching writer)
TODO implement updates in create and remove methods
TODO shouldn't readers do that too?
TODO fails deepList test for a mix of test and implementation issues
# Conflicts:
#	README.md
#	pom.xml
abstracts further to platform dependent separation strings
(e.g. "/" vs "\"). The resulting path string is now strictly a platform
dependent path and should not be used in code where a platform
independent string is expected.  Thanks to the Windows FileSystem
implementation accepting "/" separated paths, this should not cause
pracitcal issues at this time but we may have to change the API to support
other FileSystems (e.g. passing paths along as components instead of
strings or settling on a nio Path like object).
  better on key value stores that do not really have paths.

TODO check for leftover redundant path creation, that wouldn't break
  things but makes an unnecessary exists test
@bogovicj
Copy link
Contributor

bogovicj commented Apr 5, 2023

In talking with @cmhulbert , it may be that we want to use N5URL.normalizePath in the N5Readers / Writers, and keyValueAccess.normalize (and remove leading slashes) for paths within the container that are used by the cache.

Edit: see https://github.com/saalfeldlab/n5/tree/cache-normalization

@bogovicj
Copy link
Contributor

bogovicj commented Apr 5, 2023

Currently we're seeing this behavior here

N5Writer n5 = ...
n5.setAttribute(groupName, "key", 0.1);
n5.getAttribute(groupName, "key", Integer.class )); // returns 0 (i.e. rounds rather than return null)

The issue seems to be deep in Gson
https://github.com/google/gson/blob/29c93895bbcaed02178abc9e3d47b73878aaca73/gson/src/main/java/com/google/gson/internal/LazilyParsedNumber.java#L42

@bogovicj
Copy link
Contributor

@cmhulbert will want to ask you about this when you have time 7c37cf9

@bogovicj
Copy link
Contributor

Re the discussion on zulip,

We could consider adding a protected constructor to N5KeyValueReader that skips the version check. This could let us avoid sneaking in the static initializeContainer in the N5KeyValueWriter constructor before its superclass (N5KeyValueReader) constructor is called. See here:

https://github.com/saalfeldlab/n5/tree/kv-reader-cons

@bogovicj
Copy link
Contributor

bogovicj commented Apr 14, 2023

@axtimwalde , branch of interest: https://github.com/saalfeldlab/n5/tree/wip/KeyValueInterface

  • That branch also includes use of a protected constructor for the N5KeyValueReader and no longer needs a static initializeContainer method (details above and on zulip)

@cmhulbert ,
Looking at the cache again, I think some (hopefully small) edits are needed for writing. Specifically, what happens now on on group / dataset creation (I think), is:

  • Writer creates what it has to do through the key-value-access
  • Writer triggers a cache update
  • Cache update triggers a reading of the backend
    • not necessary if the writer can update the cache directly
    • possibly unsafe if the writing failed, but better to detect that

@bogovicj
Copy link
Contributor

  • Toward making the cache work more nicely with writers: 239b3d9
  • Use an interface (N5JsonCachableContainer) instead of funcitons / consumers for the cache: 069ceda
  • remove the temporary classes: df285ee

@bogovicj
Copy link
Contributor

I ran this benchmark comparing the current master a1fcd2f to the latest development commit d14fe55 . In summary: they're comparable.

current master
1 threads.
1 : raw : 0.118000s
1 : bzip2 : 26.535000s
1 : gzip : 3.202000s
1 : lz4 : 0.211000s
1 : xz : 19.963000s
1 : tif : 0.084000s
1 : hdf5 raw : 0.116000s
1 : hdf5 gzip : 1.875000s
2 threads.
2 : raw : 0.045000s
2 : bzip2 : 15.311000s
2 : gzip : 1.798000s
2 : lz4 : 0.109000s
2 : xz : 10.856000s
2 : tif : 0.039000s
2 : hdf5 raw : 0.035000s
2 : hdf5 gzip : 1.875000s
4 threads.
4 : raw : 0.025000s
4 : bzip2 : 9.451000s
4 : gzip : 1.016000s
4 : lz4 : 0.063000s
4 : xz : 6.578000s
4 : tif : 0.019000s
4 : hdf5 raw : 0.037000s
4 : hdf5 gzip : 1.933000s
8 threads.
8 : raw : 0.032000s
8 : bzip2 : 7.119000s
8 : gzip : 0.667000s
8 : lz4 : 0.041000s
8 : xz : 4.864000s
8 : tif : 0.014000s
8 : hdf5 raw : 0.036000s
8 : hdf5 gzip : 1.950000s
16 threads.
16 : raw : 0.015000s
16 : bzip2 : 6.383000s
16 : gzip : 0.474000s
16 : lz4 : 0.026000s
16 : xz : 4.278000s
16 : tif : 0.011000s
16 : hdf5 raw : 0.041000s
16 : hdf5 gzip : 2.173000s
latest development
1 threads.
1 : raw : 0.117000s
1 : bzip2 : 26.146000s
1 : gzip : 3.463000s
1 : lz4 : 0.204000s
1 : xz : 19.513000s
1 : tif : 0.140000s
1 : hdf5 raw : 0.113000s
1 : hdf5 gzip : 1.762000s
2 threads.
2 : raw : 0.069000s
2 : bzip2 : 14.401000s
2 : gzip : 1.771000s
2 : lz4 : 0.109000s
2 : xz : 10.648000s
2 : tif : 0.042000s
2 : hdf5 raw : 0.036000s
2 : hdf5 gzip : 1.876000s
4 threads.
4 : raw : 0.027000s
4 : bzip2 : 9.021000s
4 : gzip : 1.004000s
4 : lz4 : 0.062000s
4 : xz : 6.474000s
4 : tif : 0.022000s
4 : hdf5 raw : 0.038000s
4 : hdf5 gzip : 1.898000s
8 threads.
8 : raw : 0.024000s
8 : bzip2 : 7.307000s
8 : gzip : 0.662000s
8 : lz4 : 0.041000s
8 : xz : 4.729000s
8 : tif : 0.016000s
8 : hdf5 raw : 0.040000s
8 : hdf5 gzip : 2.000000s
16 threads.
16 : raw : 0.025000s
16 : bzip2 : 6.529000s
16 : gzip : 0.480000s
16 : lz4 : 0.028000s
16 : xz : 4.268000s
16 : tif : 0.011000s
16 : hdf5 raw : 0.044000s
16 : hdf5 gzip : 2.146000s

@bogovicj bogovicj merged commit bf0a145 into master Jun 27, 2023
@cmhulbert cmhulbert deleted the cache-attributes branch December 4, 2024 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants