Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zipfile regression: When writing a zip64 entry to an unseekable file, the local file header compressed/uncompressed fields are not set to 0 #106218

Closed
chenxiaolong opened this issue Jun 28, 2023 · 1 comment
Labels
type-bug An unexpected behavior, bug, or error

Comments

@chenxiaolong
Copy link

Bug report

When creating a zip file with a zip64 entry in a streaming way (unseekable file), my understanding from the spec is that the headers should be set up so that:

  1. The local file header general flags field has bit 3 set (§4.4.4)
  2. The local file header crc32, compressed size, and uncompressed size fields are set to 0 (§4.4.4)
  3. The local file header has a zip64 (0x0001) extra record so that data descriptor sizes are interpreted as 64-bit integers (§4.3.9.2)
  4. A data descriptor header exists (§4.3.9.1)

zipfile normally does all this, but (2) seems to have broken after #103861/#103863. With that change, the local file header's compressed and uncompressed sizes are set to 0xffffffff instead of 0x00000000. (0xffffffff is correct for the usual case where files are seekable and data descriptors are not used.)

From initial testing, it seems like this is all that's needed to fix the issue:

--- zipfile.py	2023-06-08 22:29:05.000000000 -0400
+++ zipfile2.py	2023-06-28 17:50:45.841709476 -0400
@@ -463,8 +463,9 @@
             fmt = '<HHQQ'
             extra = extra + struct.pack(fmt,
                                         1, struct.calcsize(fmt)-4, file_size, compress_size)
-            file_size = 0xffffffff
-            compress_size = 0xffffffff
+            if not (self.flag_bits & _MASK_USE_DATA_DESCRIPTOR):
+                file_size = 0xffffffff
+                compress_size = 0xffffffff
             min_version = ZIP64_VERSION
 
         if self.compress_type == ZIP_BZIP2:

To reproduce

import zipfile


class UnseekableFile:
    def __init__(self, fp):
        self.fp = fp

    def write(self, data):
        return self.fp.write(data)

    def flush(self):
        self.fp.flush()


with open('test.zip', 'wb') as f_raw:
    with zipfile.ZipFile(UnseekableFile(f_raw), 'w') as z:
        with z.open('foobar', 'w', force_zip64=True) as f:
            f.write(b'Hello, world!')

The resulting file looks like this, as reported by zipdetails (green=good, red=problematic).

 0000 LOCAL HEADER #1       04034B50
 0004 Extract Zip Spec      2D '4.5'
 0005 Extract OS            00 'MS-DOS'
+0006 General Purpose Flag  0008
+     [Bit  3]              1 'Streamed'
 0008 Compression Method    0000 'Stored'
 000A Last Mod Time         00210000 'Mon Dec 31 19:00:00 1979'
+000E CRC                   00000000
-0012 Compressed Length     FFFFFFFF
-0016 Uncompressed Length   FFFFFFFF
 001A Filename Length       0006
 001C Extra Length          0014
 001E Filename              'foobar'
+0024 Extra ID #0001        0001 'ZIP64'
+0026   Length              0010
+0028   Uncompressed Size   0000000000000000
+0030   Compressed Size     0000000000000000
 0038 PAYLOAD               Hello, world!
 
+0045 STREAMING DATA HEADER 08074B50
+0049 CRC                   EBE6C6E6
+004D Compressed Length     000000000000000D
+0055 Uncompressed Length   000000000000000D
 
 005D CENTRAL HEADER #1     02014B50
 0061 Created Zip Spec      2D '4.5'
 0062 Created OS            03 'Unix'
 0063 Extract Zip Spec      2D '4.5'
 0064 Extract OS            00 'MS-DOS'
 0065 General Purpose Flag  0008
      [Bit  3]              1 'Streamed'
 0067 Compression Method    0000 'Stored'
 0069 Last Mod Time         00210000 'Mon Dec 31 19:00:00 1979'
 006D CRC                   EBE6C6E6
 0071 Compressed Length     0000000D
 0075 Uncompressed Length   0000000D
 0079 Filename Length       0006
 007B Extra Length          0000
 007D Comment Length        0000
 007F Disk Start            0000
 0081 Int File Attributes   0000
      [Bit 0]               0 'Binary Data'
 0083 Ext File Attributes   01800000
 0087 Local Header Offset   00000000
 008B Filename              'foobar'
 
 0091 END CENTRAL HEADER    06054B50
 0095 Number of this disk   0000
 0097 Central Dir Disk no   0000
 0099 Entries in this disk  0001
 009B Total Entries         0001
 009D Size of Central Dir   00000034
 00A1 Offset to Central Dir 0000005D
 00A5 Comment Length        0000
 Done

Your environment

  • CPython versions tested on: 3.11.4
  • Operating system and architecture: Alpine Linux x86_64
@chenxiaolong
Copy link
Author

I think my understanding of the spec was incorrect. When I read this in §4.4.4:

        Bit 3: If this bit is set, the fields crc-32, compressed 
               size and uncompressed size are set to zero in the 
               local header.  The correct values are put in the 
               data descriptor immediately following the compressed
               data. 

I interpreted "set to zero in the local header" as in the two 32-bit size fields in the local header. I believe the spec considers the zip64 extra record to be part of the local header, so the correct interpretation is that the zip64 extra record's sizes are set to 0. This is what Python is doing now.

It also matches what the zip command does:

dd if=/dev/zero bs=1M count=5000 | zip | cat > test.zip

The resulting zip looks like this:

000000 LOCAL HEADER #1       04034B50
000004 Extract Zip Spec      2D '4.5'
000005 Extract OS            00 'MS-DOS'
000006 General Purpose Flag  0008
       [Bits 1-2]            0 'Normal Compression'
       [Bit  3]              1 'Streamed'
000008 Compression Method    0008 'Deflated'
00000A Last Mod Time         56E37E59 'Mon Jul  3 11:50:50 2023'
00000E CRC                   00000000
000012 Compressed Length     FFFFFFFF
000016 Uncompressed Length   FFFFFFFF
00001A Filename Length       0001
00001C Extra Length          0014
00001E Filename              '-'
00001F Extra ID #0001        0001 'ZIP64'
000021   Length              0010
000023   Uncompressed Size   0000000000000000
00002B   Compressed Size     0000000000000000
000033 PAYLOAD

4DA380 STREAMING DATA HEADER 08074B50
4DA384 CRC                   6AA043F2
4DA388 Compressed Length     00000000004DA34D
4DA390 Uncompressed Length   0000000138800000

Given that, I'm closing this issue since it's a misunderstanding on my part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant