Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Support partial fetches for s3 backend #252

Open
pontus opened this issue Mar 29, 2023 · 11 comments
Open

Support partial fetches for s3 backend #252

pontus opened this issue Mar 29, 2023 · 11 comments

Comments

@pontus
Copy link
Contributor

pontus commented Mar 29, 2023

As a service consumer/end user
I want to be able to fetch parts of an object (file) using a range
To be able to do "random I/O" in an object
To support fetching just the needed intermediate/baseline images for what I want to see in a DICOM WSI

A/C:
[ ] Server reports bytes in Accept-Ranges header
[ ] Server (as binary) only retrieves the data necessary to satisfy the request
[ ] Server correctly retrieves the correct data for a given Range header (this may be one or several ranges requested).

This likely has bearing on at least Crypt4GH.

The current main branch seems to have support for coordinates that I guess have a similar purpose, but I'm a bit unsure about that functionality, but I don't see any indication it allows for better handling on the backend. It also has the byte range in the URL rather than through standard HTTP range requests.

@blankdots
Copy link
Contributor

small comment to this crypt4gh ranges were designed to work with file formats like bam, cram as the header of those files can contain informed that can be used to identify e.g. Chr 2.
Does DICOM has this kind of header information?

@pontus
Copy link
Contributor Author

pontus commented Apr 5, 2023

It's complicated :(

It's a format that is based on bag/tag, and there is no (mandatory) simple lookup table. I don't know all the details, but have noted that one of the listed rationales for WSI DICOMs tiled pyramid approach is to support only loading the needed tile(s) for what's should be displayed.

The DICOM format is also designed to support doubling as TIFFs ("dual-personality"), I haven't checked how those will work either but they might use have more explicit references (although I doubt it).

In practice my impression is this need no be very bad although one may need to do a few touchdowns at the start.

@erikogabrielsson
Copy link

For all bigpicture wsi dicom files there should be a (e.g. we require) an offset table. As Pontus say, the format is based on tag--value attributes, and the offset table is either in the Extended Offset Table attribute or the first item in the Pixel Data Attribute.

In a file the attributes are ordered by tag number, and the Extended offset table and Pixel data is last (more or less), in the 74E0-group. For fast access one would want to be able to read the file up to the 7FE0-group to get all the metadata, and then use the offset tables to read the frames one needs.

@erikogabrielsson
Copy link

And for dual-Tiff, that is something that needs to be prepared when creating the file. This would help parsing out the positions of the tiles for applications that supports TIFF but not DICOM, but the tiff header would not contain all the metadata (unless one for example translates dicom tags to tiff tags or dumps the DICOM tags as json into a tiff tag). We have chosen not to include this in the bigpicture dicom wsi standard.

@erikogabrielsson
Copy link

But it would be possible to create a simple script that parses out the needed positions to a separate index file, if that would help?

@pontus
Copy link
Contributor Author

pontus commented Apr 5, 2023

Without speaking for Stefan, I believe this question was to assess how useful this support would be in practice (a backend that supports random access does not help much if the file format requires reading the entire file up to the desired data).

I don't see any need for additional tooling right now. It would be nice seeing if the planned services (e.g. Cytomine) reads file in a good way (in contrast to reading the entire file), but as that is still under wraps as far as I understand, I believe there's not much to do for now.

@erikogabrielsson
Copy link

Cytomine uses WsiDicom, which wants to read the file up to the 7FE0-group first, then load the offset table on first tile access, and then tiles. WsiDicom is designed to allow parsing other sources than DICOM-on-file, so it could be possible to make WsiDicom read the encrypted files directly.

@blankdots
Copy link
Contributor

just to mention it here: http://samtools.github.io/hts-specs/crypt4gh.pdf we need to do what is mentioned in section 4.2.

meaning when we encrypt the file these packages need to be added

@pontus
Copy link
Contributor Author

pontus commented Apr 5, 2023

Cytomine uses WsiDicom

I didn't know that, and then I have questions :(

Integrating python code in Java and bridging between those is certainly possibly, but not exactly a great time, so my guess would be that such use is to convert/import into something else (which would make the utility of this feature much less valuable).

@pontus
Copy link
Contributor Author

pontus commented Apr 5, 2023

just to mention it here: http://samtools.github.io/hts-specs/crypt4gh.pdf we need to do what is mentioned in section 4.2.

meaning when we encrypt the file these packages need to be added

I don't believe I understand this - actual encryption would be done client side before uploading, and server side reencryption would only be header replacement which needs not care about edit lists.

Server-side decryption (to serve unencrypted) would need to support parsing edit lists (this is already the case I believe for the golang crypt4gh code). What I think the golang code is currently missing is supporting special handling of a ReadSeeker and not reading unnecessary contents if that's what's passed.

@blankdots
Copy link
Contributor

What I think the golang code is currently missing is supporting special handling of a ReadSeeker and not reading unnecessary contents if that's what's passed.

yes

@teemukataja teemukataja changed the title Support partial fetches Support partial fetches for s3 backend Jul 27, 2023
@blankdots blankdots linked a pull request Dec 14, 2023 that will close this issue
@blankdots blankdots removed a link to a pull request Dec 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants