Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7-bit index (instead of 6) #20

Open
nigeltao opened this issue Dec 1, 2021 · 13 comments
Open

7-bit index (instead of 6) #20

nigeltao opened this issue Dec 1, 2021 · 13 comments

Comments

@nigeltao
Copy link
Owner

nigeltao commented Dec 1, 2021

Here's a possible re-packing of the bits, increasing the size of the color cache from 64 to 128 entries.

#define QOI_INDEX   0x00 // 0xxxxxxx  +0 bytes  i7
#define QOI_DIFF_8  0x80 // 10xxxxxx  +0 bytes  r2g2b2
#define QOI_DIFF_16 0xc0 // 1100xxxx  +1 bytes  r4g4b4
#define QOI_DIFF_24 0xd0 // 1101xxxx  +2 bytes  r5g5b5a5
#define QOI_DIFF_32 0xe0 // 11100000  +3 bytes  r8g8b8
#define QOI_DIFF_40 0xe1 // 11100001  +4 bytes  r8g8b8a8
#define QOI_RUN_8   0xe0 // 111xxxxx  +0 bytes  from
                //  0xe2    11100010            up to
                //  0xfd    11111101            run length   2 ..=    29
#define QOI_RUN_16  0xfe // 11111110  +1 bytes  run length  30 ..=   285
#define QOI_RUN_24  0xff // 11111111  +2 bytes  run length 286 ..= 65821

On my laptop, a win for images/misc, a clear win for images/screenshots and same-ish for the photos.

        decode ms   encode ms   decode mpps   encode mpps   size kb
images/kodak
qoi-before:   6.4         8.3         61.02         47.11       771
qoi-after:    6.0         9.3         65.64         42.21       772
images/misc
qoi-before:   5.8         6.1        154.48        146.05       400
qoi-after:    5.5         5.8        162.43        154.29       400
images/screenshots
qoi-before:  53.9        48.6        152.63        169.25      2582
qoi-after:   49.2        40.9        167.28        201.09      2481
images/textures
qoi-before:   1.6         2.0         80.17         65.13       184
qoi-after:    1.4         2.0         90.81         65.21       180
images/wallpaper
qoi-before: 137.6       153.6         68.10         61.02     10640
qoi-after:  126.9       155.8         73.83         60.16     10669
@nigeltao
Copy link
Owner Author

nigeltao commented Dec 1, 2021

Patch:

diff --git a/qoi.h b/qoi.h
index 0aec728..bc75d7d 100644
--- a/qoi.h
+++ b/qoi.h
@@ -309,19 +309,24 @@ void *qoi_decode(const void *data, int size, qoi_desc *desc, int channels);
 	#define QOI_FREE(p)    free(p)
 #endif
 
-#define QOI_INDEX   0x00 // 00xxxxxx
-#define QOI_RUN_8   0x40 // 010xxxxx
-#define QOI_RUN_16  0x60 // 011xxxxx
-#define QOI_DIFF_8  0x80 // 10xxxxxx
-#define QOI_DIFF_16 0xc0 // 110xxxxx
-#define QOI_DIFF_24 0xe0 // 1110xxxx
-#define QOI_COLOR   0xf0 // 1111xxxx
-
+#define QOI_INDEX   0x00 // 0xxxxxxx  +0 bytes  i7
+#define QOI_DIFF_8  0x80 // 10xxxxxx  +0 bytes  r2g2b2
+#define QOI_DIFF_16 0xc0 // 1100xxxx  +1 bytes  r4g4b4
+#define QOI_DIFF_24 0xd0 // 1101xxxx  +2 bytes  r5g5b5a5
+#define QOI_DIFF_32 0xe0 // 11100000  +3 bytes  r8g8b8
+#define QOI_DIFF_40 0xe1 // 11100001  +4 bytes  r8g8b8a8
+#define QOI_RUN_8   0xe0 // 111xxxxx  +0 bytes  from
+                //  0xe2    11100010            up to
+                //  0xfd    11111101            run length   2 ..=    29
+#define QOI_RUN_16  0xfe // 11111110  +1 bytes  run length  30 ..=   285
+#define QOI_RUN_24  0xff // 11111111  +2 bytes  run length 286 ..= 65821
+
+#define QOI_MASK_1  0x80 // 10000000
 #define QOI_MASK_2  0xc0 // 11000000
 #define QOI_MASK_3  0xe0 // 11100000
 #define QOI_MASK_4  0xf0 // 11110000
 
-#define QOI_COLOR_HASH(C) (C.rgba.r ^ C.rgba.g ^ C.rgba.b ^ C.rgba.a)
+#define QOI_COLOR_HASH(C) ((C.rgba.r ^ C.rgba.g ^ C.rgba.b ^ C.rgba.a) & 127)
 #define QOI_MAGIC \
 	(((unsigned int)'q') << 24 | ((unsigned int)'o') << 16 | \
 	 ((unsigned int)'i') <<  8 | ((unsigned int)'f'))
@@ -377,7 +382,7 @@ void *qoi_encode(const void *data, const qoi_desc *desc, int *out_len) {
 
 	const unsigned char *pixels = (const unsigned char *)data;
 
-	qoi_rgba_t index[64] = {0};
+	qoi_rgba_t index[128] = {0};
 
 	int run = 0;
 	qoi_rgba_t px_prev = {.rgba = {.r = 0, .g = 0, .b = 0, .a = 255}};
@@ -403,22 +408,28 @@ void *qoi_encode(const void *data, const qoi_desc *desc, int *out_len) {
 
 		if ( 
 			run > 0 && 
-			(run == 0x2020 || px.v != px_prev.v || px_pos == px_end)
+			(run == 65821 || px.v != px_prev.v || px_pos == px_end)
 		) {
-			if (run < 33) {
-				run -= 1;
+			if (run == 1) {
+				bytes[p++] = QOI_DIFF_8 | 0x2A;
+			}
+			else if (run < 30) {
 				bytes[p++] = QOI_RUN_8 | run;
 			}
+			else if (run < 286) {
+				bytes[p++] = QOI_RUN_16;
+				bytes[p++] = run - 30;
+			}
 			else {
-				run -= 33;
-				bytes[p++] = QOI_RUN_16 | run >> 8;
-				bytes[p++] = run;
+				bytes[p++] = QOI_RUN_24;
+				bytes[p++] = (run - 286) >> 8;
+				bytes[p++] = run - 286;
 			}
 			run = 0;
 		}
 
 		if (px.v != px_prev.v) {
-			int index_pos = QOI_COLOR_HASH(px) % 64;
+			int index_pos = QOI_COLOR_HASH(px);
 
 			if (index[index_pos].v == px.v) {
 				bytes[p++] = QOI_INDEX | index_pos;
@@ -447,11 +458,11 @@ void *qoi_encode(const void *data, const qoi_desc *desc, int *out_len) {
 					}
 					else if (
 						va == 0 && 
-						vr > -17 && vr < 16 && 
+						vr >  -9 && vr <  8 &&
 						vg >  -9 && vg <  8 && 
 						vb >  -9 && vb <  8
 					) {
-						bytes[p++] = QOI_DIFF_16   | (vr + 16);
+						bytes[p++] = QOI_DIFF_16   | (vr +  8);
 						bytes[p++] = (vg + 8) << 4 | (vb +  8);
 					}
 					else {
@@ -460,12 +471,18 @@ void *qoi_encode(const void *data, const qoi_desc *desc, int *out_len) {
 						bytes[p++] = (vb + 16) << 5 | (va + 16);
 					}
 				}
+				else if (!va) {
+					bytes[p++] = QOI_DIFF_32;
+					bytes[p++] = vr;
+					bytes[p++] = vg;
+					bytes[p++] = vb;
+				}
 				else {
-					bytes[p++] = QOI_COLOR | (vr ? 8 : 0) | (vg ? 4 : 0) | (vb ? 2 : 0) | (va ? 1 : 0);
-					if (vr) { bytes[p++] = px.rgba.r; }
-					if (vg) { bytes[p++] = px.rgba.g; }
-					if (vb) { bytes[p++] = px.rgba.b; }
-					if (va) { bytes[p++] = px.rgba.a; }
+					bytes[p++] = QOI_DIFF_40;
+					bytes[p++] = vr;
+					bytes[p++] = vg;
+					bytes[p++] = vb;
+					bytes[p++] = va;
 				}
 			}
 		}
@@ -517,7 +534,7 @@ void *qoi_decode(const void *data, int size, qoi_desc *desc, int channels) {
 	}
 
 	qoi_rgba_t px = {.rgba = {.r = 0, .g = 0, .b = 0, .a = 255}};
-	qoi_rgba_t index[64] = {0};
+	qoi_rgba_t index[128] = {0};
 
 	int run = 0;
 	int chunks_len = size - QOI_PADDING;
@@ -528,24 +545,17 @@ void *qoi_decode(const void *data, int size, qoi_desc *desc, int channels) {
 		else if (p < chunks_len) {
 			int b1 = bytes[p++];
 
-			if ((b1 & QOI_MASK_2) == QOI_INDEX) {
+			if ((b1 & QOI_MASK_1) == QOI_INDEX) {
 				px = index[b1 ^ QOI_INDEX];
 			}
-			else if ((b1 & QOI_MASK_3) == QOI_RUN_8) {
-				run = (b1 & 0x1f);
-			}
-			else if ((b1 & QOI_MASK_3) == QOI_RUN_16) {
-				int b2 = bytes[p++];
-				run = (((b1 & 0x1f) << 8) | (b2)) + 32;
-			}
 			else if ((b1 & QOI_MASK_2) == QOI_DIFF_8) {
 				px.rgba.r += ((b1 >> 4) & 0x03) - 2;
 				px.rgba.g += ((b1 >> 2) & 0x03) - 2;
 				px.rgba.b += ( b1       & 0x03) - 2;
 			}
-			else if ((b1 & QOI_MASK_3) == QOI_DIFF_16) {
+			else if ((b1 & QOI_MASK_4) == QOI_DIFF_16) {
 				int b2 = bytes[p++];
-				px.rgba.r += (b1 & 0x1f) - 16;
+				px.rgba.r += (b1 & 0x0f) -  8;
 				px.rgba.g += (b2 >> 4)   -  8;
 				px.rgba.b += (b2 & 0x0f) -  8;
 			}
@@ -557,14 +567,31 @@ void *qoi_decode(const void *data, int size, qoi_desc *desc, int channels) {
 				px.rgba.b += (((b2 & 0x03) << 3) | ((b3 & 0xe0) >> 5)) - 16;
 				px.rgba.a +=   (b3 & 0x1f) - 16;
 			}
-			else if ((b1 & QOI_MASK_4) == QOI_COLOR) {
-				if (b1 & 8) { px.rgba.r = bytes[p++]; }
-				if (b1 & 4) { px.rgba.g = bytes[p++]; }
-				if (b1 & 2) { px.rgba.b = bytes[p++]; }
-				if (b1 & 1) { px.rgba.a = bytes[p++]; }
+			else if (b1 == QOI_DIFF_32) {
+				px.rgba.r += bytes[p++];
+				px.rgba.g += bytes[p++];
+				px.rgba.b += bytes[p++];
+			}
+			else if (b1 == QOI_DIFF_40) {
+				px.rgba.r += bytes[p++];
+				px.rgba.g += bytes[p++];
+				px.rgba.b += bytes[p++];
+				px.rgba.a += bytes[p++];
+			}
+			else if (b1 < QOI_RUN_16) {
+				run = (b1 & 0x1f) - 1;
+			}
+			else if (b1 == QOI_RUN_16) {
+				int b2 = bytes[p++];
+				run = b2 + 30 - 1;
+			}
+			else {
+				int b2 = bytes[p++];
+				int b3 = bytes[p++];
+				run = (b2<<8) + b3 + 286 - 1;
 			}
 
-			index[QOI_COLOR_HASH(px) % 64] = px;
+			index[QOI_COLOR_HASH(px)] = px;
 		}
 
 		if (channels == 4) { 

@oscardssmith
Copy link

If I were you, I'd remove QOI_RUN_16 and 24. phoboslab/qoi#53 will make them useless since it can encode the same information as well using multiple QOI_RUN_8 commands.

@nigeltao
Copy link
Owner Author

nigeltao commented Dec 1, 2021

We could do that, I suppose. There is a subtlety, though: what happens if you have many consecutive QOI_RUN_8 commands, with respect to run <<= 5 overflow? In the specification, what's the type of the run variable? If it's signed, what happens if becomes negative? (Bonus worry: signed integer overflow is undefined behavior in C.) If it's unsigned, is it still fast on the JVM (Java doesn't have unsigned integer types)? If it's 64 bit, does that make it harder to implement (or significantly slower) on 32-bit or 16-bit processors?

@oscardssmith
Copy link

TLDR is that's not a problem because there aren't really any images with more than 2^31 pixels of the same color. If we want to be robust to this, the two easy answers are to make the run variable 64 bit, or encode 2^31 repeats and then emit a QOI_COLOR to reset the run count.

@nigeltao
Copy link
Owner Author

nigeltao commented Dec 1, 2021

that's not a problem

I'm talking about malicious encoders and about having a spec that describes exactly one correct decoding for any valid input.

For example, ZLIB (well, DEFLATE) has some unused bit patterns that no well-behaved encoder should ever use. RFC 1951 section 3.2.6. "Compression with fixed Huffman codes (BTYPE=01)" says "Note that distance codes 30-31 will never actually occur in the compressed data."

In practice, security vulnerabilities can lurk in rarely-tested code paths. Even if two particular ZLIB decoders don't crash on "never actually occur" input, there can be problems if they produce different decodings. One software design might have a system decode the ZLIB and do a security audit (on the uncompressed data) before passing on the compressed data to a second system. If the second system produces a different decoding, but assumes that they don't need any security checks because it's already been audited, then we can have problems.

I have seen a real-world security problem that came down to the ZIP archive format being ambiguous when there's duplicate filenames. One system (the 'doorkeeper') used "first one wins". Another system (the 'gullible') used "last one wins", and relied on the 'doorkeeper' rejecting bad input.

@oscardssmith
Copy link

Do you have a preference for which way we deal with this? I'm fine with either.

@nigeltao
Copy link
Owner Author

nigeltao commented Dec 1, 2021

For this particular GitHub issue, fixed length QOI_RUN_16 and QOI_RUN_24 avoids the problem. :-)

@oscardssmith
Copy link

yeah, but it adds 2 useless opcodes and increases expense. Opcodes are very not free.

@nigeltao
Copy link
Owner Author

nigeltao commented Dec 1, 2021

They're 8-bit opcodes, not 4-bit opcodes. Specifically, what we're basically giving up here is the ability to represent a 30- and 31-length run in 1 byte instead of 2. It's not free, but it's still relatively cheap.

Note that the basic RUN opcode here uses 3 bits (leaving 5), not 4. Chop out 2 of the 32 for DIFF_32 and DIFF_40.

@oscardssmith
Copy link

OK, that makes sense. Can you make a PR version of this against the experimental branch to make this easier to test? Also, are your qoi-before numbers using master or experimental?

@nigeltao
Copy link
Owner Author

nigeltao commented Dec 1, 2021

The qoi-before numbers are master, as is the patch in the OP.

Might be easiest to just download and drop in this single file (it replaces qoi.h).

qoi-index7.h.txt

@mgmalheiros
Copy link

mgmalheiros commented Dec 1, 2021

Cool suggestion! Just replacing 127 by 63 in the QOI_COLOR_HASH(C) macro makes very clear the positive effect of 7-bit XOR hash codes alone, with everything else kept the same.

For the kodak images, the compressed size reduction was 0.91% less in average, from 6-bit to 7-bit (both using the qoi-index7.h code).

For the textures folder, a whooping 6.67% decrease in average.

And for screenshots, a similar 6.77% decrease in average.

@chocolate42
Copy link

chocolate42 commented Dec 4, 2021

They're 8-bit opcodes, not 4-bit opcodes. Specifically, what we're basically giving up here is the ability to represent a 30- and 31-length run in 1 byte instead of 2. It's not free, but it's still relatively cheap.

Note that the basic RUN opcode here uses 3 bits (leaving 5), not 4. Chop out 2 of the 32 for DIFF_32 and DIFF_40.

If the spec is nailed down such that there's only one correct representation of rle length 1 (and possibly other cases with overlapping encodings), that frees a bunch of otherwise wasted 8 bit tags. Special cases like QOI_RUN_16 and QOI_RUN_24 can use those tags instead of cannibalising the RUN opcode, particularly good if RUN switches to a 4 bit tag where every value is precious. If we cannot find something more useful to do with any remaining unused 8 bit tags they could even be used to extend the range of an rle encoded in 8 bits by plugging all the gaps (a bit messy but it should be possible without being too detrimental to performance as we're dealing with rle).

edit: Ignore that I'm not talking sense because most overlap is with multi-byte encodings, the only one that isn't is already handled by incrementing the run range. I'm thinking about 8 bit codes because I'm testing a replacement for QOI_DIFF_8 called QOI_DELTA which encodes rgb in the range -1..1. It has the benefit of only needing 5 bit data so it can have a 3 bit tag, but there's also 5 unused data elements aka 5 potential 8 bit tags. If QOI_RUN_16 and QOI_RUN_24 take up two slots there's 3 slots remaining which will take some thinking to utilise (maybe static replacements for QOI_COLOR to free up opcode space would be nice, QOI_COLOR_RGB QOI_COLOR_RGBA QOI_COLOR_A might work well).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants