-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inconsistent handling of duplicate ZipFile entries #117779
Comments
obfusk: thank you for also filing a bug here! |
This is still evident, and rather annoying! [ 18/291] Writing tensor blk.1.attn_norm.weight | size 4096 | type F32 | T+ 3
[ 19/291] Writing tensor blk.1.ffn_norm.weight | size 4096 | type F32 | T+ 3
[ 20/291] Writing tensor blk.2.attn_q.weight | size 4096 x 4096 | type F32 | T+ 3
[ 21/291] Writing tensor blk.2.attn_k.weight | size 4096 x 4096 | type F32 | T+ 3
[ 22/291] Writing tensor blk.2.attn_v.weight | size 4096 x 4096 | type F32 | T+ 3
[ 23/291] Writing tensor blk.2.attn_output.weight | size 4096 x 4096 | type F32 | T+ 3
Traceback (most recent call last):
File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 1219, in <module>
main()
File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 1214, in main
OutputFile.write_all(outfile, ftype, params, model, vocab, special_vocab, concurrency = args.concurrency, endianess=endianess)
File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 941, in write_all
for i, ((name, lazy_tensor), ndarray) in enumerate(zip(model.items(), ndarrays)):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 781, in bounded_parallel_map
result = futures.pop(0).result()
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 905, in do_item
tensor = lazy_tensor.load().to_ggml()
^^^^^^^^^^^^^^^^^^
File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 510, in load
ret = self._load()
^^^^^^^^^^^^
File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 520, in load
return self.load().astype(data_type)
^^^^^^^^^^^
File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 510, in load
ret = self._load()
^^^^^^^^^^^^
File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 668, in load
return UnquantizedTensor(storage.load(storage_offset, elm_count).reshape(size))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 652, in load
fp = self.zip_file.open(info)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/zipfile/__init__.py", line 1652, in open
raise BadZipFile(f"Overlapped entries: {zinfo.orig_filename!r} (possible zip bomb)")
zipfile.BadZipFile: Overlapped entries: 'pytorch_model-00001-of-00003/data/23' (possible zip bomb) this conversion works with python 3.10.4 [290/291] Writing tensor output_norm.weight | size 4096 | type F32 | T+ 110
[291/291] Writing tensor output.weight | size 32000 x 4096 | type F32 | T+ 110
Wrote /home/raijin/workspace/model_repo/ReluLLaMA-7B-PowerInfer-GGUF/ReluLLaMA-7B.powerinfer.gguf |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Bug report
Bug description:
Create a ZIP file with duplicate central directory entries pointing to the same local file header (these can be found in the wild, see e.g. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1068705, this is just an easy way to create one for testing).
Opening the duplicate entry fails if using the name or the later entry in
infolist()
, but works using the earlier entry (since the later one is considered to overlap with the earlier one, but the earlier one isn't considered to overlap with another entry or the central directory).If I modify
NameToInfo
to contain the earlier entry instead,f.open("foo")
works fine. On the one hand these ZIP files are broken. On the other hand, it would be easy to simply not overwrite existing entries inNameToInfo
, allowing these files to be opened. And this affects real-world programs trying to open real-world files. So it could be considered a regression caused by #110016). Perhaps a warning would be in order when duplicates are detected; e.g.unzip
shows an error but does extract the files.CPython versions tested on:
3.11, 3.12
Operating systems tested on:
Linux
The text was updated successfully, but these errors were encountered: