Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage of tarfile.extractall() function without validating tarfile members #226

Closed
yilmi-vmw opened this issue Apr 12, 2019 · 21 comments · Fixed by #442
Closed

Usage of tarfile.extractall() function without validating tarfile members #226

yilmi-vmw opened this issue Apr 12, 2019 · 21 comments · Fixed by #442
Assignees
Labels
bug Something went wrong technical-debt Technical Debt - we should have addressed this right away but for "reasons" we deferred
Milestone

Comments

@yilmi-vmw
Copy link

Describe the bug

The usage of the tarfile.extractall() function without validating members could be used to perform archive attacks.

This happens in two places in tern:

  • tern/utils/container.py under the extract_image_metadata() function
  • tern/utils/rootfs.py under the extract_layer_tar() function

Expected behavior

Two things can be done to protect against this attack:

  1. Check the tarfile entries for directory traversal
    (see warning under https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.extractall)
  2. Disable symlink creation

We should validate that the tarfile members names listed in its header don't contain invalid characters (../) and are not symlinks.

You can find a good description of how to protect against this here https://stackoverflow.com/a/10077309 and also an example of how to protect against these type of attacks here vmware-archive/pyvcloud#268

@nishakm nishakm added bug Something went wrong help wanted technical-debt Technical Debt - we should have addressed this right away but for "reasons" we deferred labels Apr 12, 2019
@nishakm nishakm modified the milestones: Near Future, Release 0.4.0 Apr 12, 2019
@nishakm
Copy link
Contributor

nishakm commented Apr 12, 2019

So this is interesting as the tarballs that Tern works on come from container images built by tools like Docker. It would be really cool to detect such issues with container images. The flip side to this is if most container images folks work with are subject to this attack, Tern can only sternly warn and continue until the problem is fixed in the container images.

A note: It seems that this issue might be fixed in Python 3.8 with the addition of a class SafeTarFile that satisfies most, if not all, attacks from a directory walk: https://bugs.python.org/issue21109. I doubt it though, as there is only 1 month left till feature freeze.

@nishakm
Copy link
Contributor

nishakm commented Apr 12, 2019

On further review of the conversations in the Python issue, I think they're not going to merge their fix any time soon, mostly because the security issues with the Python library inherit from the underlying security issues with tar. The suggested method of checking for absolute path, relative path and symlinks are not enough. Furthermore, I think there is a good subset of container images for which these checks will break untarring. The general suggestion is to be careful of what you untar.

However, tricking someone into downloading a naughty image from Dockerhub is definitely a thing, so I don't think we should ignore it completely.

There's a patch which takes care of most of the issues: https://bugs.python.org/file47826/safetarfile-4.diff

@nishakm nishakm modified the milestones: Release 0.4.0, Near Future Apr 13, 2019
@rparikh
Copy link
Contributor

rparikh commented Apr 13, 2019

@nishakm I would like to work on this if it is not assigned yet. Please let me know.

Thank you, Ravi.

@yilmi-vmw
Copy link
Author

yilmi-vmw commented Apr 15, 2019

@nishakm, thanks for looking at the best approach to solve it.

I don't know all the current usages of tern, only that if there's a way of reaching one of the highlighted functions, an attack is possible. As it is difficult to define all use cases of any project, I would suggest that we consider the worst case scenario where tern receives untrusted tars on a protected system.

Also, the fact that tern runs in a container shouldn't be considered sufficient protection for a system, there are plenty of things that can go wrong such as insecure container runtime configuration.

The fix from the pyvcloud team was just taken as an example, as they are dealing with OVA's they can just discard symlinks. In tern's case this might indeed require more thought.

However, there is one case for which we can already protect from which is the arbitrary file overwrite using directory traversal sequences in the member filename, I don't see any valid use case for it.

Thanks
Yass

@nishakm
Copy link
Contributor

nishakm commented Apr 15, 2019

@nishakm, thanks for looking at the best approach to solve it.

I don't know all the current usages of tern, only that if there's a way of reaching one of the highlighted functions, an attack is possible. As it is difficult to define all use cases of any project, I would suggest that we consider the worst case scenario where tern receives untrusted tars on a protected system.

Agree 100%. There are three ways I see we can resolve this:

  1. Wait for the patch to the underlying Python stdlib package to be merged
  2. Keep the patch along with the rest of the project and apply it when setting up a virtual environment <- This is very cumbersome to do
  3. Create a patch that is basically a copy of the above patch and revert it when the above patch gets merged. This one has some legal implications that I will need to follow up on before proceeding (i.e. How do I give credit)

Also, the fact that tern runs in a container shouldn't be considered sufficient protection for a system, there are plenty of things that can go wrong such as insecure container runtime configuration.

I would think that the warning on the README is sufficient indication that running Tern in a container is not secure :)

The fix from the pyvcloud team was just taken as an example, as they are dealing with OVA's they can just discard symlinks. In tern's case this might indeed require more thought.
Fair and yes.

However, there is one case for which we can already protect from which is the arbitrary file overwrite using directory traversal sequences in the member filename, I don't see any valid use case for it.

I would rather all the protections we know about at this point (stdlib patch above) arrive in one REVERTME patch, hopefully with something in place to track REVERTMEs.

Thanks
Yass

@nishakm
Copy link
Contributor

nishakm commented Apr 15, 2019

@nishakm I would like to work on this if it is not assigned yet. Please let me know.

Thank you, Ravi.

@rparikh Please hold off until I figure out the best approach for the fix. Thanks for volunteering and stay tuned.

@nishakm
Copy link
Contributor

nishakm commented Apr 15, 2019

So here's how we can proceed:

  1. Create a new file in tern/utils called tarfile. Use this file for modules that check a tarball for possible patterns for attack. Use https://bugs.python.org/file47826/safetarfile-4.diff as a reference, and make sure to provide adequate credit (something to the effect of: # from <link> by <author>). I don't think we need to subclass Tarfile but we can create exceptions in this file (like the SecurityException) and raise them when an error occurs.
  2. Test it against the test tarballs in the link: https://github.com/jwilk/traversal-archives
  3. Update tern/utils/container.py and tern/utils/rootfs.py to use the new extract function from this tarfile.
  4. Test against container images from Dockerhub. I would pick ones that are large in size and have many layers. Eg: postgres, nginx, node, golang
  5. Submit PR

@nishakm
Copy link
Contributor

nishakm commented Apr 15, 2019

@rparikh Do you want to take this on?

@rparikh
Copy link
Contributor

rparikh commented Apr 16, 2019

@nishakm yes. I would love to work on this. Thank you, Ravi.

@nishakm
Copy link
Contributor

nishakm commented Apr 16, 2019

@nishakm yes. I would love to work on this. Thank you, Ravi.

Awesome!! Note that this is quite a big task, so a series of small commits in the PR is fine. It would be great if you could also write a test for all the modules in this file in tests called test_utils_tarfile.py. Really appreciate your help!

@rparikh
Copy link
Contributor

rparikh commented Apr 17, 2019

@nishakm Just wanted to update that I have started working on this. I will be submitting a review soon. As you mentioned, full things will be committed in small batches. Thank you so much, Ravi.

@rparikh
Copy link
Contributor

rparikh commented Apr 17, 2019

So here's how we can proceed:

I don't think we need to subclass Tarfile but we can create exceptions in this file (like the SecurityException) and raise them when an error occurs.

@nishakm So we will create a class and have methods to examine the content? And we can use tarfile object as an argument and just iterate over each member and decide whether its a risk or not? I am just thinking if we need to have a class if we do not subclass it to Tarfile. We can have an independent method to examine the content.

Also are we just looking for relative name/symlink issue? or also looking for max number of files and size point of view as well?

Thank you, Ravi.

@nishakm
Copy link
Contributor

nishakm commented Apr 17, 2019

So here's how we can proceed:
I don't think we need to subclass Tarfile but we can create exceptions in this file (like the SecurityException) and raise them when an error occurs.

@nishakm So we will create a class and have methods to examine the content? And we can use tarfile object as an argument and just iterate over each member and decide whether its a risk or not? I am just thinking if we need to have a class if we do not subclass it to Tarfile. We can have an independent method to examine the content.

As mentioned, we don't need to create a class that is a subclass of Tarfile. Individual defs for each of the checks is fine. The exceptions help as you can raise them when you encounter an issue. Yes, you can have one def for iterating over the members of the tarfile and then do a check for each one like the function _check_member in the patch I linked to. You can use all of these in one module called check_tarfile or something that raises the appropriate exception.

Also are we just looking for relative name/symlink issue? or also looking for max number of files and size point of view as well?

We can set different limits than what is given. 100000 files and 1GiB size limits seem reasonable to me. However, I have seen container images that go over both. Here we can probably fall back on a stern log warning and continue on. For the paths, we can exit gracefully.

Thank you, Ravi.

@nishakm nishakm added the assigned This issue has been assigned to someone who is actively working on it label Apr 18, 2019
@nishakm nishakm modified the milestones: Near Future, Release 0.6.0 Aug 12, 2019
@nishakm
Copy link
Contributor

nishakm commented Aug 12, 2019

@rparikh Are you still working on this? We need this solved urgently as it does pose a security issue. You can submit a PR with "WIP" in the title and I can take over from where you left off if you are not finding the time to finish. Would this be OK with you?

@nishakm nishakm removed the assigned This issue has been assigned to someone who is actively working on it label Aug 22, 2019
@nishakm nishakm self-assigned this Aug 22, 2019
@rparikh
Copy link
Contributor

rparikh commented Aug 23, 2019

@nishakm I am really sorry for the late reply on this thread. I have worked on this but could not complete it. Will it be okay if I start the review over this weekend? Though I could not complete it, I would still like to contribute. Please let me know.

@nishakm
Copy link
Contributor

nishakm commented Aug 23, 2019

@rparikh sure! Like I said before, if you could submit a PR with WIP: in the title I can take over and complete it for you. Your DCO will be included in the commit.

@nishakm
Copy link
Contributor

nishakm commented Aug 30, 2019

@rparikh I still haven't seen a PR from you. I'm going to continue to work on this. You can work on something else when you get the time. Thanks for trying!

@nishakm
Copy link
Contributor

nishakm commented Aug 30, 2019

#435 will also resolve this

  1. We use the system tar utility as CVE-2013-4420 has been fixed in libtar since 2014 and tern requires a compatible distro dated circa 2015 anyway.
  2. Sometimes the container image filesystems cannot be untarred as a regular user as the files may belong to root. So we need to invoke tar as sudo.

@nishakm
Copy link
Contributor

nishakm commented Aug 30, 2019

  1. We use the system tar utility as CVE-2013-4420 has been fixed in libtar since 2014 and tern requires a compatible distro dated circa 2016 anyway.
    @yilmi-vmw do you think this is reasonable?

@yilmi-vmw
Copy link
Author

yilmi-vmw commented Sep 2, 2019

  1. We use the system tar utility as CVE-2013-4420 has been fixed in libtar since 2014 and tern requires a compatible distro dated circa 2016 anyway.
    @yilmi-vmw do you think this is reasonable?

@nishakm I have absolutely no problem with this solution as it will indeed mitigate the issue by processing safely the archive.

On your end you will lose the stacktrace info after the tar archive call and will probably have to handle the executable return error codes on a per case basis.

@nishakm
Copy link
Contributor

nishakm commented Sep 3, 2019

Depends on #435

nishakm pushed a commit to nishakm/tern that referenced this issue Sep 5, 2019
This resolves tern-tools#226

This change does the following:
1. Replace python's tarfile with a call to the system's tar
   utility. We do this to take advantage of the CVE-2013-4420 fix
   to libtar. Python's tarfile module has a fix in the workings
   but is yet to be merged as of this change.
2. We check for EOF errors and empty tarballs. This is mostly to
   address a few instances where we have seen Docker images that
   were malformed.
3. We modify some functions around loading and analyzing Docker
   images and layers including catching the extra errors that we
   raise for No. 2.

- rootfs: Moved some functionality from check_tar_permissions into
  a new function called shell_command. This function simply runs
  shell commands as the current user and returns the result and
  error to be dealt by the calling function.
- rootfs: check_tar_permissions will now use shell_command.
- rootfs: Created a new function called check_tar_members which
  will list the elements in the tarball to see if there are any
  EOF or empty tarballs.
- rootfs: Repurposed extract_layer_tar to be a general purpose
  extract_tarfile function which can be used throughout the code.
- container: Use extract_tarfile to extract image metadata.
- image_layer: Use extract_tarfile to extract image layer tarballs.
- analyze: Set up mount points after the image is loaded.
- report: In general setup, don't create directories. extract_tarfile
  will now do it.
- report: Catch all the appropriate errors that might get thrown when
  trying to load an image.

Signed-off-by: Nisha K <[email protected]>
rnjudge pushed a commit that referenced this issue Sep 6, 2019
This resolves #226

This change does the following:
1. Replace python's tarfile with a call to the system's tar
   utility. We do this to take advantage of the CVE-2013-4420 fix
   to libtar. Python's tarfile module has a fix in the workings
   but is yet to be merged as of this change.
2. We check for EOF errors and empty tarballs. This is mostly to
   address a few instances where we have seen Docker images that
   were malformed.
3. We modify some functions around loading and analyzing Docker
   images and layers including catching the extra errors that we
   raise for No. 2.

- rootfs: Moved some functionality from check_tar_permissions into
  a new function called shell_command. This function simply runs
  shell commands as the current user and returns the result and
  error to be dealt by the calling function.
- rootfs: check_tar_permissions will now use shell_command.
- rootfs: Created a new function called check_tar_members which
  will list the elements in the tarball to see if there are any
  EOF or empty tarballs.
- rootfs: Repurposed extract_layer_tar to be a general purpose
  extract_tarfile function which can be used throughout the code.
- container: Use extract_tarfile to extract image metadata.
- image_layer: Use extract_tarfile to extract image layer tarballs.
- analyze: Set up mount points after the image is loaded.
- report: In general setup, don't create directories. extract_tarfile
  will now do it.
- report: Catch all the appropriate errors that might get thrown when
  trying to load an image.

Signed-off-by: Nisha K <[email protected]>
rnjudge pushed a commit to rnjudge/tern that referenced this issue Jun 5, 2020
This resolves tern-tools#226

This change does the following:
1. Replace python's tarfile with a call to the system's tar
   utility. We do this to take advantage of the CVE-2013-4420 fix
   to libtar. Python's tarfile module has a fix in the workings
   but is yet to be merged as of this change.
2. We check for EOF errors and empty tarballs. This is mostly to
   address a few instances where we have seen Docker images that
   were malformed.
3. We modify some functions around loading and analyzing Docker
   images and layers including catching the extra errors that we
   raise for No. 2.

- rootfs: Moved some functionality from check_tar_permissions into
  a new function called shell_command. This function simply runs
  shell commands as the current user and returns the result and
  error to be dealt by the calling function.
- rootfs: check_tar_permissions will now use shell_command.
- rootfs: Created a new function called check_tar_members which
  will list the elements in the tarball to see if there are any
  EOF or empty tarballs.
- rootfs: Repurposed extract_layer_tar to be a general purpose
  extract_tarfile function which can be used throughout the code.
- container: Use extract_tarfile to extract image metadata.
- image_layer: Use extract_tarfile to extract image layer tarballs.
- analyze: Set up mount points after the image is loaded.
- report: In general setup, don't create directories. extract_tarfile
  will now do it.
- report: Catch all the appropriate errors that might get thrown when
  trying to load an image.

Signed-off-by: Nisha K <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something went wrong technical-debt Technical Debt - we should have addressed this right away but for "reasons" we deferred
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants