Dask, memmap, lazy #350
Replies: 2 comments
-
I am asking as I have mere idea how to achieve chunked random reading of bruker format, but I don't see clearly "the holy grail" the thing where to aim that ability.
both could be somehow married? |
Beta Was this translation helpful? Give feedback.
-
These are valid points/questions that touch on what we try to achieve with rosettasciio! Coming back to the bruker format, it seems that there are functionalities that would be useful to others (for example as metioned in #36).
This is a wrong/misleading implementation of lazy loading in case the bruker format. Another example of inconsistent lazy loading is Velox emd files is fully lazy but it is still useful and sometime, it it is a matter of balancing needs and efforts!
Yes, the
What does the "unpacked" content mean here?
Yes, what we recommend is to convert to something like a zarr format, typically zspy. @CSSFrancis will know most likely have a better understanding that me on this, but in the case of bruker format, I would expect that using the |
Beta Was this translation helpful? Give feedback.
-
I see some issues #345 , #198 , #18 , #241 , #211 .... and Bruker reader also has some rudimentary lazy implementation using dask, but I never got a complete picture how these lazy memmap and dask are interconnected and how to use them correctly (i.e. dask has many options). The "lazy" flag behaviour I feel is inconsistent. I.e. with bruker "lazy" only delays loading - that would be ok for 5D hypercube (i.e. FIB sliced EDS), but still not solve the problem when single bruker file unpacks into array larger than RAM. Other formats seems use other features of dask, in some cases it looks like redundant (i.e. tiff files uses memmap, and it is encapsulated with dask?).
This is so complicated that I even probably can't formulate the right questions. lazy, dask and memmap, it seems quite murky water for me. As for hitherto implemented lazy loading of bruker format, as we see in #241 , is useless. Even conversion to other format is doomed at the moment as to do that file needs to be whole loaded into memory at first.
So the only possibility to save some huge files (convert) would be to get chunked dask array, and writter needs support for chunked writting. (right?)
Beta Was this translation helpful? Give feedback.
All reactions