-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subtile decoding: memory use reduction and perf improvements #1010
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…id potential int overflow
…dimension Instead of being the full tile size. * Use a sparse array mechanism to store code-blocks and intermediate stages of IDWT. * IDWT, DC level shift and MCT stages are done just on that smaller array. * Improve copy of tile component array to final image, by saving an intermediate buffer. * For full-tile decoding at reduced resolution, only allocate the tile buffer to the reduced size, instead of the full-resolution size.
…peration by properly initializing working buffer
…uced in previous commit)
…lion pixels However the intermediate buffer for decoding must still be smaller than 4 billion pixels, so this is useful for decoding at a lower resolution level, or subtile decoding.
…mber of pixels must remain under 4 billion)
Untested though, since that means a tile buffer of at least 16 GB. So there might be places where uint32 overflow on multiplication still occur...
…zation to reading at reduced resolution as well
…) for single-tiled images * Only works for single-tiled images --> will error out cleanly, as currently in other cases * Save re-reading the codestream for the tile, and re-use code-blocks of the previous decoding pass. * Future improvements might involve improving opj_decompress, and the image writing logic, to use this strategy.
…rays in vertical pass
…retical) better multi-threading in subtile decoding
…ile data exceeds system limits' (refs uclouvain#730 (comment))
…h, for irreversible
rouault
force-pushed
the
subtile_decoding_stage3
branch
from
September 1, 2017 20:23
5b2fb10
to
c1e0fba
Compare
This was referenced Sep 2, 2017
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The gist of this PR is commit f9e9942
The effect is a reduction of the decoding time of "opj_decompress -i MAPA.jp2 -o out.tif -d 0,0,256,256" from 900ms to 190ms, and a reduction of RAM allocation from 2.27 GB to 265 MB (220 MB of them being the ingestion of the codestream).
master:
With PR :
Similarly with a 9x7 compressed image, "opj_decompress -i MAPA_97.jp2 -o out.tif -d 0,0,256,256" from 1500ms to 180ms
Another significant commit of this PR is 0ae3cba
The test_decode_area utility can now decode images of more than 4giga pixels by proceeding by strips. e.g the following decodes the first 3072 lines of a 66000x66000 image by chunks of (at most) 1200 lines at a time
The memory consumption indicated by valgrind --tool=massif is 2.2 GB (1 GB if using strips of 256 lines, 770 MB for strips of 64 lines), which seems still a bit high, so probably still room for improvements in that area. Whereas opj_decompress -d 0,0,66000,3072 requires 3.5 GB