-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't read the mdat box from ISOBMFF files #1961
Comments
Very well explained, @dhoulder. We currently control/direct the boxHandler() with a couple of predicates. |
@dhoulder Thanks for the good hint for code improvement. |
While it's probably not a big problem, I should also mention that the current handling of recursive boxes involves some unnecessary reads too: the outer box gets read in its entirety, then each box inside it gets re-read, and so on all the way down. I suspect this might not be worth changing, but I haven't actually done any benchmarking. All the data will be in page cache anyway after the first read, and unlike the mdat case, we're typically not talking about large amounts in the first place. |
Working on this now |
I suspect (without reading the code) that address = 0 ;
while ( something ) {
seek(address);
readBoxHeader;
if ( superBox(....) ) {
boxHandler();
} else if ( knownBox(...) ) {
allocate; read;
perform magic;
}
address=address+box.length+8; I'm unclear about how to interact with @paolobenve and the darktable guys. Do you know Matt McGuire in Sydney? He's one of the darktable folks. I'd like to get 0.27.50 ready (0.27.5 Preview) with your code changes and ask darktable to build with that code send it to Paolo for testing. It's all kind of vague. Something will work out. |
@clanmills Yes, I think something like that would be the ideal approach: read the header and then only read the box body if required, and in the superBox() case, just the stuff before the inner boxes. I think that might involve a substantial rewrite, so I was thinking of something less intrusive that would achieve most of the gains of that approach. See dhoulder@4e587fa Quick benchmark against that R5 file I used in #1961 (comment)
0.534s vs 2.937s. Much better! All tests pass too. Comments welcome. If you know of other box types that are likely to be big (trak maybe?) I can skip those too, but I think mdat is the low hanging fruit. |
I think the "big win" is to not read Can we take a two stage approach:
|
@clanmills : Yep, that two stage approach sounds good to me. I will prepare a pull request and use that mergifyio trick to create a backport to 0.27-maintenance |
Looks like the github CI stuff has got into another tangle - see #1974 |
Look's like the the Windows-10/CI has gone bonkers. Try your git "empty" trick to trigger another build.
We've had a genuine issue on macOS/CI. That was fixed yesterday with a change in cmake/compilerFlags.cmake. |
BmffImage::boxHandler() reads every box it encounters, whether it's interested in it or not. This includes mdat, which can be huge. Here's a quick benchmark comparing a CR3 file from a Canon R5 that has a 40Mb mdat box with a another file with the mdat box chopped off:
This could be a significant issue for darktable and rawtherapee which will want to scan a bunch of files when importing. The memory consumption also spikes, but that's transient and probably less of a problem.
The fix should be pretty easy:
before
io_->read(data.data(), data.size());
There may be other boxes that can be skipped, but mdat is typically the big one where the main image data lives.
The text was updated successfully, but these errors were encountered: