-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tarfile.TarPath #89812
Comments
It would be helpful to have a pathlib-compatible object in tarfile, similarly to zipfile.Path. |
I vaguely recall exploring this concept and finding that tarfiles don’t supply the requisite interface because they’re not random access. I’m only 10% confident in that recollection, so worth exploring. |
That is good to know. This isn't very high on my priority list, but I will try to explore when I have some time. |
It's possible to do, but will be a little slow due to the nature of tar files. They're a big linked list of files, so you need to do a bunch of reads/seeks from the start to the end to enumerate all files. I'd ask that we try to get bpo-24132 solved first. That would let us write: # tarfile.py
class Path(pathlib.AbstractPath):
def iterdir(self):
...
def stat(self):
... We'd fill in a smallish number of abstract methods to get a full |
I'd recommend not to block on bpo-24132. It's not obvious to me that subclassing would be valuable. It depends on how it's implemented, but in my experience, zipfile.Path doesn't and cannot implement the full interface of pathlib.Path. Instead zipfile.Path attempts to implement a protocol. At the time, the protocol was undefined, but now there exists importlib.resources.abc.Traversable (https://docs.python.org/3/library/importlib.html#importlib.abc.Traversable), the interface needed by importlib.resources. I'd honestly just create a standalone class, see if it can implement Traversable, and only then consider if it should implement a more complicated interface (such as something with symlink support or perhaps even later subclassing from pathlib.Path). |
If you're only aiming for Traversable compatibility, sure. The original bug description asks for something that's pathlib-compatible and similar to zipfile.Path, which goes beyond the Traversable interface in attempting to emulate pathlib.Path. The pathlib.Path interface is a good one - I see no reason it can't apply to zip and tar archives in full. Methods of Path objects already raise NotImplementedError if operations aren't supported (e.g. creating symlinks) Some prototyping from a couple years back, including a tar path implementation: https://github.com/barneygale/pathlab/tree/master/pathlab |
Anyone implementing this should be aware that tar members don't have unique names. |
Re-arrange `pathlib.Path` methods in source code. No other changes. The methods are arranged as follows: 1. `stat()` and dependants (`exists()`, `is_dir()`, etc) 2. `open()` and dependants (`read_text()`, `write_bytes()`, etc) 3. `iterdir()` and dependants (`glob()`, `walk()`, etc) 4. All other `Path` methods This patch prepares the ground for a new `_AbstractPath` class, which will support the methods in groups 1, 2 and 3 above. By churning the methods here, subsequent patches will be easier to review and less likely to break things.
Re-arrange `pathlib.Path` methods in source code. No other changes. The methods are arranged as follows: 1. `stat()` and dependants (`exists()`, `is_dir()`, etc) 2. `open()` and dependants (`read_text()`, `write_bytes()`, etc) 3. `iterdir()` and dependants (`glob()`, `walk()`, etc) 4. All other `Path` methods This patch prepares the ground for a new `_AbstractPath` class, which will support the methods in groups 1, 2 and 3 above. By churning the methods here, subsequent patches will be easier to review and less likely to break things.
Re-arrange `pathlib.Path` methods in source code. No other changes. The methods are arranged as follows: 1. `stat()` and dependants (`exists()`, `is_dir()`, etc) 2. `open()` and dependants (`read_text()`, `write_bytes()`, etc) 3. `iterdir()` and dependants (`glob()`, `walk()`, etc) 4. All other `Path` methods This patch prepares the ground for a new `_AbstractPath` class, which will support the methods in groups 1, 2 and 3 above. By churning the methods here, subsequent patches will be easier to review and less likely to break things.
* main: (47 commits) pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188) pythonGH-104308: socket.getnameinfo should release the GIL (python#104307) pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311) pythongh-99113: A Per-Interpreter GIL! (pythongh-104210) pythonGH-104284: Fix documentation gettext build (python#104296) pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251) pythongh-104223: Fix issues with inheriting from buffer classes (python#104227) pythongh-99108: fix typo in Modules/Setup (python#104293) pythonGH-104145: Use fully-qualified cross reference types for the bisect module (python#104172) pythongh-103193: Improve `getattr_static` test coverage (python#104286) Trim trailing whitespace and test on CI (python#104275) pythongh-102500: Remove mention of bytes shorthand (python#104281) pythongh-97696: Improve and fix documentation for asyncio eager tasks (python#104256) pythongh-99108: Replace SHA3 implementation HACL* version (python#103597) pythongh-104273: Remove redundant len() calls in argparse function (python#104274) pythongh-64660: Don't hardcode Argument Clinic return converter result variable name (python#104200) pythongh-104265 Disallow instantiation of `_csv.Reader` and `_csv.Writer` (python#104266) pythonGH-102613: Improve performance of `pathlib.Path.rglob()` (pythonGH-104244) pythongh-103650: Fix perf maps address format (python#103651) pythonGH-89812: Churn `pathlib.Path` methods (pythonGH-104243) ...
This internal class excludes the `__fspath__()`, `__bytes__()` and `as_uri()` methods, which must not be inherited by a future `tarfile.TarPath` class.
This internal class excludes the `__fspath__()`, `__bytes__()` and `as_uri()` methods, which must not be inherited by a future `tarfile.TarPath` class.
Slightly expand the test coverage of `is_junction()`. The existing test only checks that `os.path.isjunction()` is called under-the-hood.
- Split out dedicated test for unbuffered `open()` - Split out dedicated test for `is_mount()` at the filesystem root - Avoid `os.stat()` when checking that empty paths point to '.' - Remove unnecessary `rmtree()` call
Make assertions about the `st_mode`, `st_ino` and `st_dev` attributes of the stat results from two files and a directory, rather than checking if `chmod()` affects `st_mode` (which is already tested elsewhere).
Remove `PathTest.dirlink()` function. Symlinks in `PathTest.setUp()` are created using `os.symlink()` directly; symlinks in test functions use `Path.symlink_to()` in order to make the tests applicable to a (near-)future `AbstractPath` class.
) Adjust the pathlib tests to add a new `PathTest.can_symlink` class attribute, which allows us to enable or disable symlink support in tests. A (near-)future commit will add an `AbstractPath` class; its tests will hard-code the value to `True` or `False` depending on a stub subclass's capabilities.
…6062) Slightly expand the test coverage of `is_junction()`. The existing test only checks that `os.path.isjunction()` is called under-the-hood.
- Split out dedicated test for unbuffered `open()` - Split out dedicated test for `is_mount()` at the filesystem root - Avoid `os.stat()` when checking that empty paths point to '.' - Remove unnecessary `rmtree()` call - Remove unused `assertSame()` method
Make assertions about the `st_mode`, `st_ino` and `st_dev` attributes of the stat results from two files and a directory, rather than checking if `chmod()` affects `st_mode` (which is already tested elsewhere).
…onGH-106061) Remove `PathTest.dirlink()` function. Symlinks in `PathTest.setUp()` are created using `os.symlink()` directly; symlinks in test functions use `Path.symlink_to()` in order to make the tests applicable to a (near-)future `AbstractPath` class.
PR available that adds |
What should If the answer isn't clear then I'd like to incubate this in a PyPI project for a while first @encukou I saw you were working on tarfile recently. Pinging in case this interests you! |
IMO, symlinks to files that are in the archive should continue to work. It's very much not clear-cut. A tarball is, essentially, a collection of procedural instructions that allow merging contents with an existing filesystem in ”interesting” ways, while A PyPI project that handles the simple cases sounds like a good start. |
FWIW, I won't make one any time soon -- I'm taking a break from |
Add private `pathlib._PathBase` class. This will be used by an experimental PyPI package to incubate a `tarfile.TarPath` class. Co-authored-by: Adam Turner <[email protected]>
Add private `pathlib._PathBase` class. This will be used by an experimental PyPI package to incubate a `tarfile.TarPath` class. Co-authored-by: Adam Turner <[email protected]>
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
pathlib.Path
methods #104243tarfile.TarPath
#104272pathlib._PurePathExt
#104810pathlib.Path
test methods #105807pathlib.UnsupportedOperation
#105926pathlib.Path.is_junction()
returns false #106062pathlib.Path.stat()
#106064pathlib._PathBase
#106337The text was updated successfully, but these errors were encountered: