Support partial fetches for s3 backend #252

pontus · 2023-03-29T16:15:49Z

As a service consumer/end user
I want to be able to fetch parts of an object (file) using a range
To be able to do "random I/O" in an object
To support fetching just the needed intermediate/baseline images for what I want to see in a DICOM WSI

A/C:
[ ] Server reports bytes in Accept-Ranges header
[ ] Server (as binary) only retrieves the data necessary to satisfy the request
[ ] Server correctly retrieves the correct data for a given Range header (this may be one or several ranges requested).

This likely has bearing on at least Crypt4GH.

The current main branch seems to have support for coordinates that I guess have a similar purpose, but I'm a bit unsure about that functionality, but I don't see any indication it allows for better handling on the backend. It also has the byte range in the URL rather than through standard HTTP range requests.

The text was updated successfully, but these errors were encountered:

blankdots · 2023-04-04T19:20:28Z

small comment to this crypt4gh ranges were designed to work with file formats like bam, cram as the header of those files can contain informed that can be used to identify e.g. Chr 2.
Does DICOM has this kind of header information?

pontus · 2023-04-05T05:57:08Z

It's complicated :(

It's a format that is based on bag/tag, and there is no (mandatory) simple lookup table. I don't know all the details, but have noted that one of the listed rationales for WSI DICOMs tiled pyramid approach is to support only loading the needed tile(s) for what's should be displayed.

The DICOM format is also designed to support doubling as TIFFs ("dual-personality"), I haven't checked how those will work either but they might use have more explicit references (although I doubt it).

In practice my impression is this need no be very bad although one may need to do a few touchdowns at the start.

erikogabrielsson · 2023-04-05T06:40:13Z

For all bigpicture wsi dicom files there should be a (e.g. we require) an offset table. As Pontus say, the format is based on tag--value attributes, and the offset table is either in the Extended Offset Table attribute or the first item in the Pixel Data Attribute.

In a file the attributes are ordered by tag number, and the Extended offset table and Pixel data is last (more or less), in the 74E0-group. For fast access one would want to be able to read the file up to the 7FE0-group to get all the metadata, and then use the offset tables to read the frames one needs.

erikogabrielsson · 2023-04-05T06:43:29Z

And for dual-Tiff, that is something that needs to be prepared when creating the file. This would help parsing out the positions of the tiles for applications that supports TIFF but not DICOM, but the tiff header would not contain all the metadata (unless one for example translates dicom tags to tiff tags or dumps the DICOM tags as json into a tiff tag). We have chosen not to include this in the bigpicture dicom wsi standard.

erikogabrielsson · 2023-04-05T06:44:16Z

But it would be possible to create a simple script that parses out the needed positions to a separate index file, if that would help?

pontus · 2023-04-05T06:55:59Z

Without speaking for Stefan, I believe this question was to assess how useful this support would be in practice (a backend that supports random access does not help much if the file format requires reading the entire file up to the desired data).

I don't see any need for additional tooling right now. It would be nice seeing if the planned services (e.g. Cytomine) reads file in a good way (in contrast to reading the entire file), but as that is still under wraps as far as I understand, I believe there's not much to do for now.

erikogabrielsson · 2023-04-05T07:01:15Z

Cytomine uses WsiDicom, which wants to read the file up to the 7FE0-group first, then load the offset table on first tile access, and then tiles. WsiDicom is designed to allow parsing other sources than DICOM-on-file, so it could be possible to make WsiDicom read the encrypted files directly.

blankdots · 2023-04-05T07:06:39Z

just to mention it here: http://samtools.github.io/hts-specs/crypt4gh.pdf we need to do what is mentioned in section 4.2.

meaning when we encrypt the file these packages need to be added

pontus · 2023-04-05T07:09:42Z

Cytomine uses WsiDicom

I didn't know that, and then I have questions :(

Integrating python code in Java and bridging between those is certainly possibly, but not exactly a great time, so my guess would be that such use is to convert/import into something else (which would make the utility of this feature much less valuable).

pontus · 2023-04-05T07:14:10Z

just to mention it here: http://samtools.github.io/hts-specs/crypt4gh.pdf we need to do what is mentioned in section 4.2.

meaning when we encrypt the file these packages need to be added

I don't believe I understand this - actual encryption would be done client side before uploading, and server side reencryption would only be header replacement which needs not care about edit lists.

Server-side decryption (to serve unencrypted) would need to support parsing edit lists (this is already the case I believe for the golang crypt4gh code). What I think the golang code is currently missing is supporting special handling of a ReadSeeker and not reading unnecessary contents if that's what's passed.

blankdots · 2023-04-05T07:23:29Z

What I think the golang code is currently missing is supporting special handling of a ReadSeeker and not reading unnecessary contents if that's what's passed.

yes

teemukataja changed the title ~~Support partial fetches~~ Support partial fetches for s3 backend Jul 27, 2023

blankdots linked a pull request Dec 14, 2023 that will close this issue

try to fix coordinates #347

Merged

blankdots removed a link to a pull request Dec 14, 2023

try to fix coordinates #347

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support partial fetches for s3 backend #252

Support partial fetches for s3 backend #252

pontus commented Mar 29, 2023

blankdots commented Apr 4, 2023

pontus commented Apr 5, 2023 •

edited

Loading

erikogabrielsson commented Apr 5, 2023

erikogabrielsson commented Apr 5, 2023

erikogabrielsson commented Apr 5, 2023

pontus commented Apr 5, 2023

erikogabrielsson commented Apr 5, 2023

blankdots commented Apr 5, 2023

pontus commented Apr 5, 2023

pontus commented Apr 5, 2023

blankdots commented Apr 5, 2023

Support partial fetches for s3 backend #252

Support partial fetches for s3 backend #252

Comments

pontus commented Mar 29, 2023

blankdots commented Apr 4, 2023

pontus commented Apr 5, 2023 • edited Loading

erikogabrielsson commented Apr 5, 2023

erikogabrielsson commented Apr 5, 2023

erikogabrielsson commented Apr 5, 2023

pontus commented Apr 5, 2023

erikogabrielsson commented Apr 5, 2023

blankdots commented Apr 5, 2023

pontus commented Apr 5, 2023

pontus commented Apr 5, 2023

blankdots commented Apr 5, 2023

pontus commented Apr 5, 2023 •

edited

Loading