-
-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow custom pixel data parsing and streaming pixel data directly from the reader. #223
base: main
Are you sure you want to change the base?
Conversation
Add getter for limited reader for the pixel data Add getter for the data set
Restore original pixel data parsing and remove unnecessary changes
Is it possible to upstream this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for the PR. I think the intent here is to do something like what was requested in #224?
I have some concerns with this PR, summarized below:
- Introducing the GetPixelDataReader() API to Parser is problematic (in its current form), because it's too easy to use it incorrectly. For example, if a user creates a new Parser, and then immediately calls GetPixelDataReader() it will actually return bytes that are not the PixelData. Maybe we could try to fix this by reading the dataset first, adding some guards, or just reading the elements we need to read the PixelData (as I described a little in Add option to return Reader for PixelData element #224). Either way, if we were to go with this approach, I think we would need to think about the right APIs and how to make it clear and intuitive for the user what to do.
- I think introducing GetDataset complicates the Parser API (which was designed to act as an element-by-element API via the use of Next()). This actually would prevent us from doing the kind of optimization I mentioned in Add option to return Reader for PixelData element #224, where the Parser itself only holds onto elements that are needed for processing "later" elements (this is one of the approaches I was thinking of starting with to make things a little more lightweight).
- I need to look a little more closely to verify and double check this, but I don't think your stopAtPixelDataValue implementation is correct, because I think readElement may encounter PixelData tags before the actual main PixelData of the DICOM in some cases. For example, consider PixelData that's part of an IconImageSequence which I think in your implementation would be skipped when parsing.
I think these points along with the ones in #224 may require some more careful discussion, and I'd also like to hear more about your motivating use cases so that we can arrive at the right solution. Some quick thoughts from me on possible solutions:
- Create a utility function (e.g. not part of the Parser API) that uses the element-by-element Parser API to return (Dataset, pixelData io.Reader)
- Here's an optimization I wanted to do anyway as mentioned in Add option to return Reader for PixelData element #224--namely, in the element-by-element Parser, don't hold onto the full Dataset of all parsed elements--only hold the elements we need to successfully Parse later elements, leave it up to the caller what they wish to do with Elements returned from Next(). I can send a PR for this soon.
- Include an option and new Element type to hold unparsed pixeldata (e.g. we just read it into a buffer without any parsing--this really only matters for NativePixelData dicoms. saves some CPU but not that much memory). One option here is to reuse the existing frame streaming channel to stream unprocessed frames to the caller (but don't save them into an element, if a certain option is set)--this could help save on memory if done correctly, though this would apply to all PixelData by deafult (incl Icons. Maybe we could exclude those if it made sense).
Hi, thank you for the comments on the PR and your thoughts on it. The use-case which we are trying to handle is that for some DICOM tags (could be PixelData or RawData held in a private tag), we do not wish the library to parse the contents of the value but rather stream it directly from the reader into some writer of our choice. This would save us in terms of processing of the value for the tag and also memory. As you suggested in point 3 earlier for possible solutions, maybe some option could be set in the parser for certain tags which the user does not want to be parsed by the library, but rather stream the bytes somewhere else (this would also mean that these elements are not held in the dataset). It does makes sense to ensure that the tags used in this option are not something whose values are needed to successfully parse later elements. This would also mean that the solution is not restricted to PixelData but any other tag which can potentially hold data. I do agree that the existing channel (which is only used in the case of PixelData currently) could be used for this purpose. Looking forward to hear your thoughts on this. |
Add optional handling of number of frames and total bytes for the pix…
Hi there! I have a draft PR I'm playing around with at #256 that addresses this to some degree--it skips the processing of NativePixelData in favor of just blindly loading the data into memory (so that it can be roundtripped out later). It's not quite an arbitrary writer solution though (the API and usage for which may not be ideal) and still loads the data into memory (but may save you some CPU for NativePixelData) Ultimately I think the right long term solution might be to instead have an option that indicates only certain elements should be processed, or a lazy load type of situation. However, for now, it seems like this is most important for PixelData so I figure it is good to start there. |
See also #257 |
Some elements, such as pixel data, might have a nil value. Furthermore the FindElementByTagNested does not adhere to the internal documentation to fully exhaust the iterators goroutine.
correct previous changes related to FindElementByTagNested
#267 has a working draft of a solution that should allow you not to hold the whole dataset in memory at once--if you are interested, please give it a try and see if it helps at all! You would have to use the Parser.Next() API something like
|
Hi @suyashkumar thank you for the exposed options which helps us inch a bit closer to our original use case. Currently, there's a way to We want to essentially skip the double writing of pixeldata, once to memory and once again to file. This option could be a general option applied for any tag (other than Looking forward to hearing your thoughts on this. Thanks. |
@dvaruas in principle it should not be difficult to have a As mentioned above there can be multiple PixelData tags (some inside sequences) in a single DICOM. If this option works by blindly writing out data to the provided writer, multiple PixelData elements could be sent at different times and it'd be up to the writer to decide what to do. If you're trying to interweave this into a Read - Modify some elements -Write pipeline, and you're trying to pass in the same underlying writer that the If it is the Read-Modify-Write pipeline, I understand your use case. It may make more sense for us to provide a separate Pipeline or Modification API where you can pass in some list of tags you wish to mutate and a mutation function (or a struct that meets some Mutator interface that could also include a list of tags to call the function for), and then we can have the Modification pipeline oversee and safely manage processing the file and calling your mutation function when appropriate for the right tags only. This seems a little safer to me than the arbitrary writer option where the writer is being written to from two separate owners without any real coordination. WDYT? |
Instead of a passlist of tags to mutate, we could also consider and option for tags not to mutate, if that is more commonly definable (or maybe an option to do both). Either way, I'd like to understand the use case better and try to understand how common of a use case this is. I opened #272 to discuss this further and think about what it could look like. |
Hi, sorry for the late response. Or in more detail: We receive DICOM's from our "own" service, after some upload processor and coversion took place, so we know that the file follows certain rules.
This was done mostly as an optimization, to speed up the loading process (by a factor of about 6):
I hope this clarifies the use case a bit. Let me know, if you still think we should move to the #272 |
Thanks for the context @DmitrijsBarbarinsFreiheit! That is definitely helpful. Some questions:
Some off-the-cuff thoughts:
|
@DmitrijsBarbarinsFreiheit it's been a while but wanted to follow up on this--do you think the approach outlined above might still make sense for y'all? |
No description provided.