Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zipfile lib overwrites the extra field during closing when the archive size is more then ZIP64_LIMIT #88233

Closed
shaanbhaya mannequin opened this issue May 7, 2021 · 6 comments · Fixed by #96161
Closed
Labels
3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@shaanbhaya
Copy link
Mannequin

shaanbhaya mannequin commented May 7, 2021

BPO 44067
Nosy @Yhg1s, @serhiy-storchaka
Files
  • zip_bug.zip
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2021-05-07.08:18:55.383>
    labels = ['type-bug', '3.8', '3.9', '3.10', '3.11', '3.7', 'library']
    title = 'Zipfile lib overwrites the extra field during closing when the archive size is more then ZIP64_LIMIT'
    updated_at = <Date 2021-11-07.00:34:59.007>
    user = 'https://bugs.python.org/shaanbhaya'

    bugs.python.org fields:

    activity = <Date 2021-11-07.00:34:59.007>
    actor = 'anadius'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2021-05-07.08:18:55.383>
    creator = 'shaanbhaya'
    dependencies = []
    files = ['50025']
    hgrepos = []
    issue_num = 44067
    keywords = []
    message_count = 3.0
    messages = ['393178', '393179', '405760']
    nosy_count = 5.0
    nosy_names = ['twouters', 'alanmcintyre', 'serhiy.storchaka', 'shaanbhaya', 'anadius']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue44067'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8', 'Python 3.9', 'Python 3.10', 'Python 3.11']

    Linked PRs

    @shaanbhaya
    Copy link
    Mannequin Author

    shaanbhaya mannequin commented May 7, 2021

    The current zipFile implementation supports the allowZip64,which can make large zip files.
    There is a bug in the current implementation of close method, where the extra field is overwritten.

    To reproduce it:

    1. Create a directory with more then 4 GB of data(spread over many files).
    2. Make the zip of those files using the any rar achiever which adds NTFS metadata (mtime, atime, and ctime) of files in the zip.
    3. Append a new file to the zip using the zip library .

    When I open the zip again, the files processed after the ZIP64_LIMIT is reached will have their extra fields overwritten.

    I have attached the zip that contained the python used to add the new file and the images of zip archive before adding new files and after.

    @shaanbhaya shaanbhaya mannequin added 3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels May 7, 2021
    @shaanbhaya
    Copy link
    Mannequin Author

    shaanbhaya mannequin commented May 7, 2021

    The issue stems from the following code inside the
    def _write_end_record(self): method ,where the extra fields are trimmed .

    if zinfo.header_offset > ZIP64_LIMIT:
        extra.append(zinfo.header_offset)
        header_offset = 0xffffffff
    else:
        header_offset = zinfo.header_offset
    
    extra_data = zinfo.extra
    min_version = 0
    if extra:
        # Append a ZIP64 field to the extra's
        extra_data = _strip_extra(extra_data, (1,))
        extra_data = struct.pack(
            '<HH' + 'Q'*len(extra),
            1, 8*len(extra), *extra) + extra_data
    
    min_version = ZIP64_VERSION

    @anadius
    Copy link
    Mannequin

    anadius mannequin commented Nov 5, 2021

    I was looking at zipfile._strip_extra trying to figure out how it works. It doesn't. It skips extra headers after the last one that matches. That's what causes this issue.

    Here's a fixed version:

    def _strip_extra(extra, xids):
        # Remove Extra Fields with specified IDs.
        unpack = _EXTRA_FIELD_STRUCT.unpack
        modified = False
        buffer = []
        start = i = 0
        while i + 4 <= len(extra):
            xid, xlen = unpack(extra[i : i + 4])
            j = i + 4 + xlen
            if xid in xids:
                if i != start:
                    buffer.append(extra[start : i])
                start = j
                modified = True
            i = j
        if i != start:
            buffer.append(extra[start : i])
        if not modified:
            return extra
        return b''.join(buffer)

    Or this one, easier to understand:

    def _strip_extra(extra, xids):
        # Remove Extra Fields with specified IDs.
        unpack = _EXTRA_FIELD_STRUCT.unpack
        modified = False
        buffer = []
        i = 0
        while i + 4 <= len(extra):
            xid, xlen = unpack(extra[i : i + 4])
            j = i + 4 + xlen
            if xid in xids:
                modified = True
            else:
                buffer.append(extra[i : j])
            i = j
        if not modified:
            return extra
        return b''.join(buffer)

    Not sure which one is better.

    @thatch
    Copy link
    Contributor

    thatch commented Jun 22, 2022

    I found the same bug while working on another issue. I prefer the easier-to-understand one.

    thatch added a commit to thatch/cpython that referenced this issue Aug 21, 2022
    Previously, any data _after_ the zip64 extra would be removed.
    
    With many new tests.
    
    Fixes python#88233
    @thatch
    Copy link
    Contributor

    thatch commented Aug 21, 2022

    I ended up not going with the easier-to-understand version, because it failed to pass through 1-3 bytes of garbage after the last extra, which we pass through in the "not modified" case already.

    thatch added a commit to thatch/cpython that referenced this issue Sep 5, 2022
    Previously, any data _after_ the zip64 extra would be removed.
    
    With many new tests.
    
    Fixes python#88233
    jaraco pushed a commit to jaraco/cpython that referenced this issue Feb 20, 2023
    Previously, any data _after_ the zip64 extra would be removed.
    
    With many new tests.
    
    Fixes python#88233
    @jaraco
    Copy link
    Member

    jaraco commented Feb 20, 2023

    I ended up not going with the easier-to-understand version, because it failed to pass through 1-3 bytes of garbage after the last extra, which we pass through in the "not modified" case already.

    In an attempt to make the code easier to comprehend and less prone to error, I drafted #102084.

    miss-islington pushed a commit that referenced this issue Feb 20, 2023
    Previously, any data _after_ the zip64 extra would be removed.
    
    With many new tests.
    
    Fixes #88233
    
    Automerge-Triggered-By: GH:jaraco
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Feb 20, 2023
    …96161)
    
    Previously, any data _after_ the zip64 extra would be removed.
    
    With many new tests.
    
    Fixes pythonGH-88233
    (cherry picked from commit 59e86ca)
    
    Co-authored-by: Tim Hatch <[email protected]>
    Automerge-Triggered-By: GH:jaraco
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Feb 20, 2023
    …96161)
    
    Previously, any data _after_ the zip64 extra would be removed.
    
    With many new tests.
    
    Fixes pythonGH-88233
    (cherry picked from commit 59e86ca)
    
    Co-authored-by: Tim Hatch <[email protected]>
    Automerge-Triggered-By: GH:jaraco
    miss-islington added a commit that referenced this issue Feb 20, 2023
    Previously, any data _after_ the zip64 extra would be removed.
    
    With many new tests.
    
    Fixes GH-88233
    (cherry picked from commit 59e86ca)
    
    Co-authored-by: Tim Hatch <[email protected]>
    Automerge-Triggered-By: GH:jaraco
    carljm added a commit to carljm/cpython that referenced this issue Feb 20, 2023
    * main: (60 commits)
      pythongh-102056: Fix a few bugs in error handling of exception printing code (python#102078)
      pythongh-102011: use sys.exception() instead of sys.exc_info() in docs where possible (python#102012)
      pythongh-101566: Sync with zipp 3.14. (pythonGH-102018)
      pythonGH-99818: improve the documentation for zipfile.Path and Traversable (pythonGH-101589)
      pythongh-88233: zipfile: handle extras after a zip64 extra (pythonGH-96161)
      pythongh-101981: Apply HOMEBREW related environment variables (pythongh-102074)
      pythongh-101907: Stop using `_Py_OPCODE` and `_Py_OPARG` macros (pythonGH-101912)
      pythongh-101819: Adapt _io types to heap types, batch 1 (pythonGH-101949)
      pythongh-101981: Build macOS as recommended by the devguide (pythonGH-102070)
      pythongh-97786: Fix compiler warnings in pytime.c (python#101826)
      pythongh-101578: Amend PyErr_{Set,Get}RaisedException docs (python#101962)
      Misc improvements to the float tutorial (pythonGH-102052)
      pythongh-85417: Clarify behaviour on branch cuts in cmath module (python#102046)
      pythongh-100425: Update tutorial docs related to sum() accuracy (FH-101854)
      Add missing 'is' to `cmath.log()` docstring (python#102049)
      pythongh-100210: Correct the comment link for unescaping HTML (python#100212)
      pythongh-97930: Also include subdirectory in makefile. (python#102030)
      pythongh-99735: Use required=True in argparse subparsers example (python#100927)
      Fix incorrectly documented attribute in csv docs (python#101250)
      pythonGH-84783: Make the slice object hashable (pythonGH-101264)
      ...
    carljm added a commit to carljm/cpython that referenced this issue Feb 22, 2023
    * main: (225 commits)
      pythongh-102056: Fix a few bugs in error handling of exception printing code (python#102078)
      pythongh-102011: use sys.exception() instead of sys.exc_info() in docs where possible (python#102012)
      pythongh-101566: Sync with zipp 3.14. (pythonGH-102018)
      pythonGH-99818: improve the documentation for zipfile.Path and Traversable (pythonGH-101589)
      pythongh-88233: zipfile: handle extras after a zip64 extra (pythonGH-96161)
      pythongh-101981: Apply HOMEBREW related environment variables (pythongh-102074)
      pythongh-101907: Stop using `_Py_OPCODE` and `_Py_OPARG` macros (pythonGH-101912)
      pythongh-101819: Adapt _io types to heap types, batch 1 (pythonGH-101949)
      pythongh-101981: Build macOS as recommended by the devguide (pythonGH-102070)
      pythongh-97786: Fix compiler warnings in pytime.c (python#101826)
      pythongh-101578: Amend PyErr_{Set,Get}RaisedException docs (python#101962)
      Misc improvements to the float tutorial (pythonGH-102052)
      pythongh-85417: Clarify behaviour on branch cuts in cmath module (python#102046)
      pythongh-100425: Update tutorial docs related to sum() accuracy (FH-101854)
      Add missing 'is' to `cmath.log()` docstring (python#102049)
      pythongh-100210: Correct the comment link for unescaping HTML (python#100212)
      pythongh-97930: Also include subdirectory in makefile. (python#102030)
      pythongh-99735: Use required=True in argparse subparsers example (python#100927)
      Fix incorrectly documented attribute in csv docs (python#101250)
      pythonGH-84783: Make the slice object hashable (pythonGH-101264)
      ...
    ambv pushed a commit that referenced this issue Mar 28, 2023
    #102087)
    
    Previously, any data _after_ the zip64 extra would be removed.
    
    With many new tests.
    
    Fixes GH-88233
    (cherry picked from commit 59e86ca)
    
    Co-authored-by: Tim Hatch <[email protected]>
    jaraco added a commit that referenced this issue Sep 25, 2023
    * Refactor zipfile._strip_extra to use higher level abstractions for extras instead of a heavy-state loop.
    
    * Add blurb
    
    * Remove _strip_extra and use _Extra.strip directly.
    
    * Use memoryview to avoid unnecessary copies while splitting Extras.
    csm10495 pushed a commit to csm10495/cpython that referenced this issue Sep 28, 2023
    * Refactor zipfile._strip_extra to use higher level abstractions for extras instead of a heavy-state loop.
    
    * Add blurb
    
    * Remove _strip_extra and use _Extra.strip directly.
    
    * Use memoryview to avoid unnecessary copies while splitting Extras.
    python-sidebar pushed a commit to python-sidebar/Python-Documentation-Fork-With-TOC that referenced this issue Sep 1, 2024
    …96161)
    
    Previously, any data _after_ the zip64 extra would be removed.
    
    With many new tests.
    
    Fixes python#88233
    
    Automerge-Triggered-By: GH:jaraco
    python-sidebar pushed a commit to python-sidebar/Python-Documentation-Fork-With-TOC that referenced this issue Sep 1, 2024
    …96161)
    
    Previously, any data _after_ the zip64 extra would be removed.
    
    With many new tests.
    
    Fixes python#88233
    
    Automerge-Triggered-By: GH:jaraco
    Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
    * Refactor zipfile._strip_extra to use higher level abstractions for extras instead of a heavy-state loop.
    
    * Add blurb
    
    * Remove _strip_extra and use _Extra.strip directly.
    
    * Use memoryview to avoid unnecessary copies while splitting Extras.
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    Status: Done
    Development

    Successfully merging a pull request may close this issue.

    2 participants