Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-103861: Fix Zip64 extensions not being properly applied in some cases #103863

Merged
merged 6 commits into from
May 16, 2023

Conversation

pR0Ps
Copy link
Contributor

@pR0Ps pR0Ps commented Apr 26, 2023

This commit fixes an issue where adding a small file to a ZipFile object while forcing zip64 extensions causes an extra Zip64 record to be added to the zip, but doesn't update the min_version or file sizes.

To create a file that reproduces the issue (copied from #103861):

import zipfile

with zipfile.ZipFile("out.zip", mode="w", allowZip64=True) as zf:
    with zf.open("text.txt", mode="w", force_zip64=True) as zi:
        zi.write(b"some data")

Diff of information extracted by zipdetails from running the above script before and after this commit.

 0000 LOCAL HEADER #1       04034B50
-0004 Extract Zip Spec      14 '2.0'
+0004 Extract Zip Spec      2D '4.5'
 0005 Extract OS            00 'MS-DOS'
 0006 General Purpose Flag  0000
 0008 Compression Method    0000 'Stored'
 000A Last Mod Time         00210000 'Mon Dec 31 19:00:00 1979'
 000E CRC                   D9C2E91E
-0012 Compressed Length     00000009
-0016 Uncompressed Length   00000009
+0012 Compressed Length     FFFFFFFF
+0016 Uncompressed Length   FFFFFFFF
 001A Filename Length       0008
 001C Extra Length          0014
 001E Filename              'text.txt'
 0026 Extra ID #0001        0001 'ZIP64'
 0028   Length              0010
 002A   Uncompressed Size   0000000000000009
 0032   Compressed Size     0000000000000009
 003A PAYLOAD               some data

 0043 CENTRAL HEADER #1     02014B50
-0047 Created Zip Spec      14 '2.0'
+0047 Created Zip Spec      2D '4.5'
 0048 Created OS            03 'Unix'
-0049 Extract Zip Spec      14 '2.0'
+0049 Extract Zip Spec      2D '4.5'
 004A Extract OS            00 'MS-DOS'
 004B General Purpose Flag  0000
 004D Compression Method    0000 'Stored'
 004F Last Mod Time         00210000 'Mon Dec 31 19:00:00 1979'
 0053 CRC                   D9C2E91E
 0057 Compressed Length     00000009
 005B Uncompressed Length   00000009
 005F Filename Length       0008
 0061 Extra Length          0000
 0063 Comment Length        0000
 0065 Disk Start            0000
 0067 Int File Attributes   0000
      [Bit 0]               0 'Binary Data'
 0069 Ext File Attributes   01800000
 006D Local Header Offset   00000000
 0071 Filename              'text.txt'

 0079 END CENTRAL HEADER    06054B50
 007D Number of this disk   0000
 007F Central Dir Disk no   0000
 0081 Entries in this disk  0001
 0083 Total Entries         0001
 0085 Size of Central Dir   00000036
 0089 Offset to Central Dir 00000043
 008D Comment Length        0000

A test has also been added that checks that these properties are correctly set.

Potential reviewer based on the git blame of the changed lines: @serhiy-storchaka (182d7cd)
Potential reviewers based on the experts index: @Yhg1s, @gpshead

@gpshead
Copy link
Member

gpshead commented Apr 26, 2023

I think this is the issue you were talking about at lunch today? (yay pycon sprints!)

@gpshead gpshead added the sprint label Apr 26, 2023
@pR0Ps
Copy link
Contributor Author

pR0Ps commented Apr 26, 2023

Yep, that's this one!

@pR0Ps pR0Ps force-pushed the bugfix/force-zip64 branch 2 times, most recently from bd02290 to 03a58a6 Compare April 27, 2023 23:26
This commit fixes an issue where adding a small file to a `ZipFile`
object while forcing zip64 extensions causes an extra Zip64 record to be
added to the zip, but doesn't update the `min_version` or file sizes.

Fixes python#103861
This fixes an issue where if data requiring zip64 extensions was added
to an unseekable stream without specifying `force_zip64=True`, zip64
extensions would not be used and a RuntimeError would not be raised when
closing the file (even though the size would be known at that point).
This would result in successfully writing corrupt zip files.

Deciding if zip64 extensions are required outside of the `FileHeader`
function means that both `FileHeader` and `_ZipWriteFile` will always be
in sync. Previously, the `FileHeader` function could enable zip64
extensions without propagating that decision to the `_ZipWriteFile`
class, which would then not correctly write the data descriptor record
or check for errors on close.
Lib/test/test_zipfile/test_core.py Show resolved Hide resolved
Lib/test/test_zipfile/test_core.py Show resolved Hide resolved
Lib/zipfile/__init__.py Outdated Show resolved Hide resolved
Lib/test/test_zipfile/test_core.py Show resolved Hide resolved
@pR0Ps
Copy link
Contributor Author

pR0Ps commented May 11, 2023

@gpshead Addressed review comments

…ompatibility in the API.

Code within this module always passes an explicit zip64, so the overall bug fix remains valid. We just don't want to break existing user code constructing their own ZipInfo objects and calling zi.FileHeader themselves for whatever reasons.  _(hopefully rare, but it isn't a protected or private API so we can't make assumptions)_
@gpshead
Copy link
Member

gpshead commented May 16, 2023

github actions CI infrastructure seems broken at the moment. But I think this the PR is ready.

I undid one part of your change so it'd remain a pure bugfix: ZipInfo.FileHeader() is technically a public API - even if it seems like nothing should use it we must assume someone does. So getting rid of the old zip64=None default behavior would be an API change, so I restored that. But it should be a no-op as FileHeader is always called with an explicit zip64 bool supplied within the zipfile module itself.

I'll take another look later and rerun CI after it's healthy again.

@gpshead gpshead merged commit 798bcaa into python:main May 16, 2023
@gpshead gpshead added type-bug An unexpected behavior, bug, or error needs backport to 3.11 only security fixes labels May 16, 2023
@miss-islington
Copy link
Contributor

Thanks @pR0Ps for the PR, and @gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry, @pR0Ps and @gpshead, I could not cleanly backport this to 3.11 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 798bcaa1eb01de7db9ff1881a3088603ad09b096 3.11

gpshead pushed a commit to gpshead/cpython that referenced this pull request May 16, 2023
…ed in some cases (pythonGH-103863)

Fix Zip64 extensions not being properly applied in some cases:

Fixes an issue where adding a small file to a `ZipFile`
object while forcing zip64 extensions causes an extra Zip64 record to be
added to the zip, but doesn't update the `min_version` or file sizes in
the primary central directory header.

Also fixed an edge case in checking if zip64 extensions are required:

This fixes an issue where if data requiring zip64 extensions was added
to an unseekable stream without specifying `force_zip64=True`, zip64
extensions would not be used and a RuntimeError would not be raised when
closing the file (even though the size would be known at that point).
This would result in successfully writing corrupt zip files.

Deciding if zip64 extensions are required outside of the `FileHeader`
function means that both `FileHeader` and `_ZipWriteFile` will always be
in sync. Previously, the `FileHeader` function could enable zip64
extensions without propagating that decision to the `_ZipWriteFile`
class, which would then not correctly write the data descriptor record
or check for errors on close.

If anyone is actually using `ZipInfo.FileHeader` as a public API without
explicitly passing True or False in for zip64, their own code may still be
susceptible to that kind of bug unless they make a similar change to
where the zip64 decision happens.

Fixes pythonGH-103861

---------

Co-authored-by: Gregory P. Smith <[email protected]>.
(cherry picked from commit 798bcaa)

Co-authored-by: Carey Metcalfe <[email protected]>
@bedevere-bot
Copy link

GH-104534 is a backport of this pull request to the 3.11 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label May 16, 2023
carljm added a commit to carljm/cpython that referenced this pull request May 16, 2023
* main:
  pythonGH-104510: Fix refleaks in `_io` base types (python#104516)
  pythongh-104539: Fix indentation error in logging.config.rst (python#104545)
  pythongh-104050: Don't star-import 'types' in Argument Clinic (python#104543)
  pythongh-104050: Add basic typing to CConverter in clinic.py (python#104538)
  pythongh-64595: Fix write file logic in Argument Clinic (python#104507)
  pythongh-104523: Inline minimal PGO rules (python#104524)
  pythongh-103861: Fix Zip64 extensions not being properly applied in some cases (python#103863)
  pythongh-69152: add method get_proxy_response_headers to HTTPConnection class (python#104248)
  pythongh-103763: Implement PEP 695 (python#103764)
  pythongh-104461: Run tkinter test_configure_screen on X11 only (pythonGH-104462)
  pythongh-104469: Convert _testcapi/watchers.c to use Argument Clinic (python#104503)
  pythongh-104482: Fix error handling bugs in ast.c (python#104483)
  pythongh-104341: Adjust tstate_must_exit() to Respect Interpreter Finalization (pythongh-104437)
  pythonGH-102613: Fix recursion error from `pathlib.Path.glob()` (pythonGH-104373)
gpshead added a commit that referenced this pull request May 17, 2023
…some cases (GH-103863) (#104534)

Fix Zip64 extensions not being properly applied in some cases:

Fixes an issue where adding a small file to a `ZipFile`
object while forcing zip64 extensions causes an extra Zip64 record to be
added to the zip, but doesn't update the `min_version` or file sizes in
the primary central directory header.

Also fixed an edge case in checking if zip64 extensions are required:

This fixes an issue where if data requiring zip64 extensions was added
to an unseekable stream without specifying `force_zip64=True`, zip64
extensions would not be used and a RuntimeError would not be raised when
closing the file (even though the size would be known at that point).
This would result in successfully writing corrupt zip files.

Deciding if zip64 extensions are required outside of the `FileHeader`
function means that both `FileHeader` and `_ZipWriteFile` will always be
in sync. Previously, the `FileHeader` function could enable zip64
extensions without propagating that decision to the `_ZipWriteFile`
class, which would then not correctly write the data descriptor record
or check for errors on close.

If anyone is actually using `ZipInfo.FileHeader` as a public API without
explicitly passing True or False in for zip64, their own code may still be
susceptible to that kind of bug unless they make a similar change to
where the zip64 decision happens.

Fixes GH-103861

---------

.
(cherry picked from commit 798bcaa)

Co-authored-by: Carey Metcalfe <[email protected]>
@pR0Ps pR0Ps deleted the bugfix/force-zip64 branch May 23, 2023 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sprint type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants