-
-
Notifications
You must be signed in to change notification settings - Fork 30.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tarfile: Traversal attack vulnerability #65308
Comments
The application does not validate the filenames inside the tar archive, allowing to extract files in arbitrary path. An attacker can craft a tar file to override files. I've view this vulnerability in libtar: |
The solution in the patch is based on the gnutar solution to this, removing the prefix when extracting and adding. |
Setting as release blocker pending evaluation. |
It's a known and well-documented behavior of the tar module: https://docs.python.org/2.7/library/tarfile.html#tarfile.TarFile.extractall |
Would it possible to disable this behaviour by default, and only enable ti explicitly? The tar command line program has for example the -P / --absolute-paths option. |
Yes, this behavior is documented, but still it is desirable to fix it. The tar utility has a lot of switches which controls extracting and by default it prevents three ways of attack (absolute names, '..' and symlinks), but there are other possible ways of attack. This is complex issue and I'm working on it. See also bpo-19974. In any case we should be very careful because every protection against attack changes a behavior (which can be safe if you know what you do), so perhaps we should add parameters which controls behavior. This is possible only in new Python version. |
Note that any issues here should also be considered for zipfile and shutil. (Well, shutil can just use the other two once the security is available.) See bpo-20907. |
Don't forget about SUID and SGID, too. |
In the past, our answer to these kinds of bug reports has always been that you must not extract an archive from an untrusted source without making sure that it has no malicious contents. And that tarfile conforms to the posix specifications with respect to extraction of files and pathname resolution. That's why we put this prominent warning in the documentation, and I think its advice still holds. I don't think that this issue should be marked as a release blocker, because the way tarfile currently works was a conscious decision, not an accident. tarfile does what it is designed to do: it processes a sequence of instructions to store a number of files in the filesystem. So the attack that is described by Daniel Garcia exploits neither a bug in tarfile nor a loophole in the tar archive format. A necessary condition for this attack to work is that the attacker has to trick the user into extracting the malicious archive first. After that, tarfile interprets the contained instructions word-for-word but still only within the boundaries defined by the user's privileges. I think it is obvious that it is potentially dangerous to extract tar archives we didn't create ourselves, because we actually give another person direct access to our filesystem. tarfile could mitigate some of the adverse effects, but this will not change the fact that it remains unsafe to use tarfile to a certain degree unless you use it with your own data or take reasonable precautions. Anyway, if we come to the conclusion that we want to eliminate this kind of attack, we must be aware that there is a lot more to do than that. tarfile as it is today is vulnerable to all known attacks against tar programs, and maybe even a few more that rely on its specific implementation.
Additionally, there are more issues mentioned in the GNU tar manual: https://www.gnu.org/software/tar/manual/html_node/Security.html In conclusion, I like to emphasize that tarfile is a library, it is no replacement for GNU tar. And as a library it has a different focus, it is merely a building block for an application, and has to be used with a little bit of responsibility. And even if we start to implement all possible checks, I'm afraid we never can do without a warning in the documentation that reminds everyone to keep an eye on what they're doing. |
Thank you Lars for your thorough reply. While I agree that this isn't a release blocker, as it was clearly designed to behave this way... it seems to me that it wouldn't take much to make the tarfile module a lot safer. Specifically:
This would fix your listed attacks 1-6. The remaining attacks you cite are denial-of-service attacks; while they're undesirable, they shouldn't compromise the security of the machine. (I suppose we could even address those, adding "reasonable" quotas for disk space and number of files.) I doubt that would make tarfile secure. But maybe "practicality beats purity"? |
Seems like shutil._unpack_tarfile() is affected. I guess it could at least do with one of those warnings in the documentation for make_archive(). The patch for this bug looks a bit over enthusiastic, for example skip_prefixes("blaua../stuff") would incorrectly strip the first bit and just return "stuff". It seems there might already be plenty of existing code to check for bad paths. Examples that come to mind:
This code either ignores the bad path elements, or ignores the whole path. Perhaps some of it could be recycled into a common function somewhere, rather than implementing it all over again for tar files. I have written my own function joinpath() to do this sort of checking, which you are welcome to use: https://bitbucket.org/vadmium/pyrescene/src/34264f6/rescene/utility.py#cl-217 You would call it with something like joinpath(tarpath.split("/"), osdir). |
Do we have any final decision on what's the best approach to solve this? I see some possibilities: a) leave the issue to the library user. I think that's a not good solution security-wise as many will be unaware of the problem and this promotes code duplication for the fix. On the other hand, this does not change default behavior. b) fix the problem as proposed in the patch sent by Daniel. This makes the tarfile secure against this kind of attacks. It does change the behavior and doesn't allow to extract in arbitrary paths, though. c) fix the problem so that by default extracting in arbitrary paths is not allowed, but allow somehow to do that optionally. This way we change the default behavior but provide an easy fix for those that depend on that functionality. d) do not change the default, but provide a well documented and easy way to activate the safety checks that fix this kind of attacks. The advantage is that it doesn't change the default behavior, the disadvantage is that many people will have to modify their code to be secure, and that the default is not very secure. For what is worth, I believe either b or c should be chosen to fix this issue. |
Also, I guess this patch solves and is closely related to bpo-1044 which was, at the time (2007), considered "not a bug". |
Let me present for discussion a proposal (and a patch with documentation) with an approach that is a little different, but in my opinion the most effective. I hope that it will appeal to all involved. My proposal consists of a new class SafeTarFile, that is a subclass and drop-in replacement for the TarFile class and can be employed whenever the user feels the necessity. It can be used the same way as TarFile, with the difference that SafeTarFile is equipped with a wide range of tests and as soon as it detects anything bad it interrupts the current operation with a SecurityError exception. This way damage is effectively averted, and it is up to the developer to decide whether he rejects the archive altogether (which is the obvious and recommended measure) or he wants to continue to process it in a subsequent step (on his own responsibility). To simplify a few common operations, SafeTarFile has three more methods: analyze(), filter() and is_safe(). These methods will allow access to the archive without SecurityError exceptions being raised. The analyze() method is a kind of low-level iterator that produces each TarInfo object together with a list of warnings (if the member is bad) as a tuple. This gives a developer access to all the information he needs to implement his own more differentiated way of handling bad archives. The filter() method is a convenience method that provides an iterator over all the "good" members of an archive leaving out all the "bad" ones. It can be used as an argument to SafeTarFile.extractall() for example. is_safe() is a high-level shortcut method that reduces the result of the analysis to a simple True or False. SafeTarFile has a variety of checks that test e.g. for bad pathnames, bad permissions and duplicate files. Also, to prevent denial-of-service scenarios, it enforces user-defined limits upon the archive, such as a maximum number of files or a maxmimum size of unpacked data. The main advantage of this approach is the higher degree of security. The practice of rewriting paths (e.g. like in Daniel.Garcia's patch) is error-prone, has side-effects and is hard to maintain because of its tendency towards regression. It just adds another layer of complexity to an already complex and delicate problem. SafeTarFile (or whatever it will be called) is backward compatible and easy to maintain, because it is an isolated addition to the tarfile module. It is easily subclassable to add more tests. It can be used as a standalone tool to check for bad archives and possible denial-of-service scenarios. Its analyze() method tells the user exactly what's wrong with the archive instead of keeping it away from him. Instead of silently extracting files to locations they weren't expected to be stored (i.e. after "fixing" their pathnames), SafeTarFile simply refuses to extract them at all. This way it is far more transparent and understandable to the user what happens. Feedback is welcome. |
bpo-17102 is open about the specific problem of escaping the destination directory. Maybe it is a duplicate, but this bug also discusses other problems. |
bpo-29788 proposes an option to disable the vulnerability in the CLI |
Lars, a huge +1 from me for your suggested approach and patch. I'd like to work this into a review-ready PR. The patch is a great step forward but still a ways from being ready for a PR: It is missing tests entirely and there are still several important methods of SafeTarFile which don't use the safety logic, e.g. next() and getmemebers(). Lars, would you be interested in continuing to work on this? |
Lars, I think the suggested approach is great. Documentation for the tarfile class should be changed in order to direct user to the "safe" version with an relevant warning. A bit like what is done for PRNG safety. |
I've tested Lars's patch against my collection of sly tarballs: SafeTarFile defeated most, but not all attacks. lrwxrwxrwx cur -> . lrwxrwxrwx cur -> . |
A. Regrading Jakub's tests, I suppose the changes needed are to B. Jakub, are your tests exhaustive or there maybe some corner cases missing? I think we don't have tests for hardlinks, yet. Also I think, we will need tests for CHRTYPE, FIFOTYPE and BLKTYPE of tar's typeflag[0] [0] http://www.gnu.org/software/tar/manual/html_node/Standard.html |
I suspect there may be more so which haven't been mentioned yet, one of which I have listed below.
For example:
But for:
|
Figured a fix for the bug I found, trick was to keep track of current working dir of symlink it was trying to evaluate. Attached patch: safetarfile-3.diff I'd like to see this go thorough, and would appreciate feedback. |
For one thing, the new diffs are still missing tests. Tests should include, at the least:
|
I can't use Jakub's repo (or Makefile from that repo) directly because it relies on tar, which doesn't look like dependency for building Python. I can make similar tarballs but I am not sure how licensing will work. I can add tarballs for the cases I discovered. As of now, I'll move ahead with my version of tarballs and will change them for licensing requirement as needed before merging it to source code. Would you prefer a PR over a patch? |
I am not a lawyer, but to the best of my understanding, using such tarballs would be fine. Since Jakub's repo only provides code to generate archive files but doesn't include actual archive files, and the generation code is licensed via the MIT license, we are free to use the code to generate archive files, which would then not fall under the copyright of the author of the generation code (Jakub). |
Added tests. Following works with 456 tests passed, after doing ./python -m unittest -v test.test_tarfileAttached patch is on top of master's commit: |
shashank, you're making good progress on this! The tests should also put SafeTarFile through all of the tests for TarFile, considering that it is being described as a drop-in replacement. You should look through the existing tests for other modules which do similar things for methods to implement this cleanly, e.g. Lib/tests/test_binascii.py and Lib/tests/datetimetester.py. |
It won't exactly be drop-in replacement. I mean if users decide to replace Tarfile with SafeTarFile, existing code may break since there might be cases where dodgy tarballs are acceptable and/or used then SafeTarFile.open will throw an exception. Having said that, I am refactoring the tests right now since the test file is ~3000 lines and adding SafeTarFile tests for every TarFile test is cluttering it. |
This must be done without adding much test code and with minimal changes to the existing tests. That's why I referenced some existing places in our code where similar things are done in tests, please take a look at those. |
There is some new research on the topic, see https://snyk.io/research/zip-slip-vulnerability, snyk/zip-slip-vulnerability#4 (comment) and BPO bpo-35909 |
Is there any update on this? Will this be fixed in the next release? Having a code execution vulnerability (yes it is!) in python for 5 years does not really spark confidence... |
There was progress made as described on this issue, but there is yet work to be done, and no-one seems to be taking this upon themselves at the moment. I agree that it would be great to have this in 3.8. |
I've added a PR in which I'm working on adding in the tests. Wanted to make sure this is the approach you had in mind? It wasn't as simple as how tests are handled in, say, test_binascii.py, because over there there was only one class that handled the main test suite, while in this file, there are multiple. |
SafeTarFile does not pass the existing tests, mainly because the existing file Lib/test/tarfiletestdata/testtar.tar seems to be "unsafe", producing errors like these: tarfile.SecurityError: <TarInfo 'ustar/blktype' at 0x7fb9119b3bb0>: block device tarfile.SecurityError: <TarInfo 'ustar/regtype' at 0x7fb9119b3910>: duplicate name It seems like the solution here is to remove block devices and duplicate names from testtar.tar. However, is this desirable -- do we need to keep these in for the tests for TarFile? |
What is the status of this issue? |
fwiw this is likely https://nvd.nist.gov/vuln/detail/CVE-2007-4559. |
There is now a bot (TrellixVulnTeam) which automatically opens pull requests [0,1] to fix this problem. Is anyone working on this right now? [0] https://github.com/brandsimon/verify-squash-root/pull/10 |
PEP-706 (Filter for tarfile.extractall) has been implemented in #102950. See the added docs. Python 3.12, and security updates to some earlier releases, will allow users to avoid CVE-2007-4559 by changing their code/settings. |
Yahoo! Thanks for your tenacity to go to the root of these issues, design an API for it and a smooth transition, and implementing it! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: