bufRead needs to be adjusted after seek() #1790

kevinbackhouse · 2021-07-18T09:45:12Z

It looks to me like the intention of this code is that bufRead tracks the current position in the file, so it needs to be updated after every call to read() or seek(). There was a missing update which caused an infinite loop.

kevinbackhouse · 2021-07-18T09:49:12Z

I converted this PR to draft because I'd like to do some further clean-up in JpegBase::printStructure before this is merged.

kevinbackhouse · 2021-07-21T21:01:48Z

I spent some more time looking at this and ended up doing a major overhaul of the code in this file, which you can see in the third commit. I believe the new code behaves exactly as before, except for one place where I made a minor change to the output by fixing a (benign) out-of-bounds read and another place where I fixed a bug in some code that unfortunately seems to be untested.

In the original code, a variable called bufRead was used to keep track of the current position in the file. There were also a lot of calls to seek(). Often the code would use seek(..., -bufRead, ...) to rewind and re-read the data. So any mistakes in the accounting of bufRead could cause the code to get confused, which is exactly what caused this bug.

Looking at the code, I was puzzled by the hard-coded number 36. JPG files are divided into segments, each of which starts with a "marker". The code seems to assume that every segment is at least 36 bytes long, which isn't actually true. In other words, it possible for the read() on line 369 to be incomplete. For example, that happens with our test file test/data/exiv2-bug1231b.jpg.

So I have concluded that the number 36 is a hack which ought to be fixed. I have replaced it with logic which (hopefully) correctly calculates the size of the payload, allocates a buffer of the corresponding size, and then reads the whole payload. As a side-effect, this has also enabled me to remove most of the calls to seek(). The purpose of most of those seeks was to rewind and reread parts of the payload when the length of the payload was greater than 36 bytes. That's not necessary if the whole payload has already been loaded into a buffer.

It turns out that this file already contained all the necessary logic to calculate the size of the payload, but it wasn't used systematically throughout the code. There are two steps:

The "marker" determines whether the segment has a non-zero payload. Markers are 1-byte numbers so this can be encoded as 256 element Boolean array, named mHasLength.
If the segment has a payload, then the size of the payload is encoded as a uint16_t in the two bytes immediately following the marker. The 2 bytes that are used to encode the size of the payload are included in the size of the payload, so the size is always at least 2.

I have duplicated the code which calculates the size of the payload and loads it into a buffer in the 4 source locations where segments are parsed. I duplicated the code because I can't create a shared utility function without modifying the header files on the 0.27-maintenance branch. When this PR is forwarded to the main branch, I will refactor it to reduce the code duplication.

The change in the expected output in test/data/icc-test.out is because I fixed an out-of-bounds read caused by line 721. The size of the payload is already included in the value of size, so adding start to it causes the subsequent call to binaryToString to read 2 bytes beyond the end of the payload.

The second bug that I fixed is on line 860. I believe that calculation is wrong. It is causing a negative number to be calculated on line 879, which causes a crash in the subsequent memory allocation. Unfortunately, that code is not covered by any of our tests, so I can't be completely sure what the expected behavior is. (I found the crash by fuzzing.)

clanmills · 2021-07-22T09:08:05Z

@kevinbackhouse This is really good stuff, Kev. Thank You for digging into this. I will be very happy if somebody else reviews this. If nobody volunteers by 2021-08-01, please email me and I will get involved. [email protected]

src/jpgimage.cpp

Co-authored-by: Christoph Hasse <[email protected]>

bufRead needs to be adjusted after seek() (backport #1790)

kevinbackhouse marked this pull request as draft July 18, 2021 09:45

kevinbackhouse added this to the v0.27.5 milestone Jul 18, 2021

kevinbackhouse added bug forward-to-main Forward changes in a 0.28.x PR to main with Mergify labels Jul 18, 2021

kevinbackhouse force-pushed the Fix-GHSA-mvc4-g5pv-4qqq branch from 1c67748 to 566b9fb Compare July 21, 2021 20:40

kevinbackhouse marked this pull request as ready for review July 21, 2021 21:17

hassec requested changes Jul 25, 2021

View reviewed changes

src/jpgimage.cpp Outdated Show resolved Hide resolved

src/jpgimage.cpp Outdated Show resolved Hide resolved

kevinbackhouse and others added 7 commits July 25, 2021 21:45

Regression test for GHSA-mvc4-g5pv-4qqq

ed90f64

bufRead needs to be adjusted after seek()

6eec038

Improved handling of jpg segments to avoid out-of-bound reads.

b3c63da

Fix compiler warning.

52ecb13

Update src/jpgimage.cpp

147e6d3

Co-authored-by: Christoph Hasse <[email protected]>

poc from GHSA-9jh3-fcc3-g6hv can now be parsed without error.

2637af0

Add comment to explain bounds-check.

b5cbb47

kevinbackhouse force-pushed the Fix-GHSA-mvc4-g5pv-4qqq branch from 93d96ad to b5cbb47 Compare July 25, 2021 21:05

hassec approved these changes Jul 26, 2021

View reviewed changes

kevinbackhouse merged commit de6b706 into Exiv2:0.27-maintenance Jul 26, 2021

mergify bot mentioned this pull request Jul 26, 2021

bufRead needs to be adjusted after seek() (backport #1790) #1799

Merged

kevinbackhouse mentioned this pull request Jul 26, 2021

Fix build error when EXIV2_DEBUG_MESSAGES is enabled #1801

Merged

kevinbackhouse added a commit that referenced this pull request Jul 26, 2021

Merge pull request #1799 from Exiv2/mergify/bp/main/pr-1790

c486999

bufRead needs to be adjusted after seek() (backport #1790)

kevinbackhouse mentioned this pull request Jul 30, 2021

out-of-bounds read in jpgimage.cpp #1815

Closed

clanmills mentioned this pull request Aug 9, 2021

Exiv2 RoadMap #1018

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bufRead needs to be adjusted after seek() #1790

bufRead needs to be adjusted after seek() #1790

kevinbackhouse commented Jul 18, 2021

kevinbackhouse commented Jul 18, 2021

kevinbackhouse commented Jul 21, 2021

clanmills commented Jul 22, 2021

bufRead needs to be adjusted after seek() #1790

bufRead needs to be adjusted after seek() #1790

Conversation

kevinbackhouse commented Jul 18, 2021

kevinbackhouse commented Jul 18, 2021

kevinbackhouse commented Jul 21, 2021

clanmills commented Jul 22, 2021