Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make pkg_tar's "./" mandatory prefix optional #50

Closed
ghost opened this issue Jan 29, 2019 · 19 comments · Fixed by #570
Closed

make pkg_tar's "./" mandatory prefix optional #50

ghost opened this issue Jan 29, 2019 · 19 comments · Fixed by #570
Assignees
Labels
breaking change bug P1 An issue that must be resolved. Must have an assignee
Milestone

Comments

@ghost
Copy link

ghost commented Jan 29, 2019

Description of the problem / feature request:

Files embedded in a tar created by pkg_tar include a mandatory "./" prefix. I would like to make this prefix optional.

Feature requests: what underlying problem are you trying to solve with this feature?

This is an interoperability/migration problem. My request incurs tech debt to finance Bazel adoption.

I'm trying to migrate a constellation of builds from Maven to Bazel. When I port project X to Bazel, I still need to support some number of preexisting downstream builds. These builds extract subsets of files from tars using whitelists which are not anchored with a leading "./". When fed a tar generated by Bazel's pkg_tar, a whitelist of "foo" doesn't match "./foo", and so the untar action is a no-op.

Rather than have to discover/fix every trivial difference downstream, I would rather start by producing Bazel bits which are drop-in replacements for Maven bits. Our internal teams can then coordinate among themselves to pay down the debt and remove use of any interop flags.

For the record, a change to remove the "./" prefix was committed back in 2017 but soon rolled back. I think an option to remove the prefix would help folks like myself w/o risk of breaking existing Bazel users.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

n/a

What operating system are you running Bazel on?

Linux - CentOS 6.6 and 7.2
macOS - High Sierra (I think)

What's the output of bazel info release?

"development version"

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

We track Bazel's Git repo and apply a small number of patches atop official releases. We're currently building against 0.21.0.

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

n/a

Have you found anything relevant by searching the web?

Yes -- https://github.com/bazelbuild/bazel/commits/master/tools/build_defs/pkg/archive.py:

I chatted w/ @ixdy briefly in Slack about this on Fri Jan 25. (Thanks!)

Any other information, logs, or outputs that you want to share?

@ghost ghost changed the title pkg_tar's "./" prefix hinders Bazel migration make pkg_tar's "./" mandatory prefix optional Jan 29, 2019
@ghost
Copy link
Author

ghost commented Jan 29, 2019

Currently wondering if it'd be enough to expose build_tar's --root_directory parameter to the pkg_tar rule, and then users can pass root_directory = "" to pkg_tar.

@ixdy
Copy link

ixdy commented Jan 29, 2019

cc @damienmg @mattmoor who helped when I attempted to submit this the first time around

I never entirely understood what broke, but I think it was something with the docker rules (but wasn't picked up in any unit tests, only breakages of downstream users). There's better test coverage on the docker rules now, so it might also be interesting to see if this still causes breakages.

@ghost
Copy link
Author

ghost commented Jan 29, 2019

I think switching the default will end up breaking someone downstream, so if maintainers agree that this is worth tackling, I imagine it can only be done by adding an option to pkg_tar and leaving it up to users to decide on a per-tar basis whether to use the prefix or not.

I ran a few experiments and added test cases to build_test.sh, and so far I'm convinced that it'd be enough to let pkg_tar make use of build_tar's --root_directory parameter. If folks agree, then perhaps it comes down to how best to adjust the pkg_tar API:

# Easiest sol'n exposes build_tar --root_directory to pkg_tar users.
# Documentation will be kind of confusing with regard to contrast to package_dir.
pkg_tar(
    name = "my_tar",
    root_directory = "",
)

# Targeted sol'n better communicates intent (esp. w/r/t package_dir) but naming is
# perhaps clumsier.
pkg_tar(
    name = "my_tar",
    prepend_dotslash = False,  # or maybe interop_prepend_dotslash?
)

@dslomov
Copy link
Contributor

dslomov commented Jan 31, 2019

also cc @aiuto

@ixdy
Copy link

ixdy commented Jan 31, 2019

I just looked into what --root_directory does. I think you are right @beasleyr-vmw - we should just expose that to the pkg_tar rule, though I agree that also package_dir makes that clumsy.

@ghost
Copy link
Author

ghost commented Jan 31, 2019

I just looked into what --root_directory does. I think you are right @beasleyr-vmw - we should just expose that to the pkg_tar rule, though I agree that also package_dir makes that clumsy.

Cool. I settled on prepend_dotslash to make progress internally (bypassing any need to reconcile with package_dir), but I'm happy to go in either (or even another) direction and to produce a pull request (inc. docs and tests) to solve this upstream. Will wait for guidance from maintainers for now.

@ghost
Copy link
Author

ghost commented Jun 27, 2019

Should we move this to the bazelbuild/rules_pkg repository?

https://help.github.com/en/articles/transferring-an-issue-to-another-repository

@aiuto aiuto transferred this issue from bazelbuild/bazel Jul 10, 2019
@kylecordes
Copy link

Now that 1.0 is here certainly anyone can agree that this should be implemented in a nonbreaking way.

But it might also be a candidate to change the default for a future major version. Thinking of tar files across numerous contexts for many years, my impression is that mandatory prefix with “.” is relatively unusual. Not prefixing (by default) would match the expectations of more adopters.

@ixdy
Copy link

ixdy commented Oct 23, 2019

@kylecordes rules_pkg is decoupled from bazel, so it probably isn't strictly necessary to follow the same deprecation strategy.

@aiuto aiuto added the P2 An issue that should be worked on when time is available label May 19, 2020
@gebn
Copy link

gebn commented Jun 24, 2020

I'm not sure if it's a related issue, but forcing ./ seems to result in trying to modify attributes of the present working directory, which isn't always possible. For example, if I try to extract to /tmp, I get:

$ tar tvf archive.tar.gz 
drwxr-xr-x 65534/65534       0 2000-01-01 00:00 ./
drwxr-xr-x 65534/65534       0 2000-01-01 00:00 ./bmc_exporter/
-r-xr-xr-x 65534/65534 18340890 2000-01-01 00:00 ./bmc_exporter/bmc_exporter
$ tar xf archive.tar.gz                                                                                                                                     
tar: .: Cannot utime: Operation not permitted                                                                                                                            
tar: .: Cannot change mode to rwxr-xr-t: Operation not permitted                                                                                                         
tar: Exiting with failure status due to previous errors

If archives did not have to contain ./, presumably this would be avoided.

@lalten
Copy link

lalten commented Sep 30, 2020

Any progress on this? would be great to have a prepend_dotslash attr.

@aiuto
Copy link
Collaborator

aiuto commented Sep 30, 2020

I have not had a chance to work on it.

I agree with @kylecordes that the initial "./" is strange, and also with @beasleyr-vmw that if we change the default that will break some people, and his analysis about using the package_root.

How does everyone feel about this new behavior.

  • "./" will not be in the path by default.
  • pkg_tar and pkg_zip will both add a package_base option which
    • prepends package_base to all paths in the archive (adding / if needed)
    • adds package_base to the archive
      • this parallels behavior of command line tools when you say 'zip -r out.zip foo'
      • some day we might follow that with the ability to specify owner and perms
    • the name package_base is subject to debate because is should be the same word as used in (the upcoming) pkg_filegroup to rebase a set of files.

If you are broken by the change in default, you can fix that by adding a package_base. Yes, this is a breaking change but we are still at 0.x, and we know the next is 0.3.0 rather than 0.2.6.

Would this be acceptable?
Does someone want to prepare a PR?

@lalten
Copy link

lalten commented Sep 30, 2020

Sounds good!

For the record, here's my workaround using a genrule. Using pkg_tar would definitely be more elegant!

genrule(
    name = "packaged_target",
    srcs = [":mytarget"],
    outs = ["archive.tar.gz"],
    cmd = " && ".join([
        "pushd $(location :mytarget)",
        "export GZIP=--no-name",  # do not save time stamp in order to stabilize archive hashes
        "tar cfah tmp.tar.gz *",
        "popd",
        "mv $(location :mytarget)/tmp.tar.gz $@",
    ]),
)

dmayle pushed a commit to dmayle/rules_pkg that referenced this issue Dec 7, 2020
dmayle pushed a commit to dmayle/rules_pkg that referenced this issue Dec 7, 2020
dmayle pushed a commit to dmayle/rules_pkg that referenced this issue Dec 7, 2020
dmayle pushed a commit to dmayle/rules_pkg that referenced this issue Dec 7, 2020
dmayle pushed a commit to dmayle/rules_pkg that referenced this issue Dec 7, 2020
aiuto pushed a commit that referenced this issue Dec 7, 2020
@aiuto aiuto closed this as completed Dec 7, 2020
@mkaufmann
Copy link

mkaufmann commented Mar 3, 2021

I was happy to see the fix for this released today but it doesn't seem to have the right semantics.

I need to package a file so that is shows the path as

Charts.yaml

otherwise the tar won't worm with the Helm tool.

I thought I would get this behavior when using

package_base = ""

but that results in a tar with a "/" prefix

/Charts.yaml

Given that this is released now I sadly don't know how to fix this in a backwards compatible way

@aiuto
Copy link
Collaborator

aiuto commented Mar 3, 2021

Yup. This is not fixed right. The behavior for this

pkg_tar(
    name = "test_no_package_base",
    srcs = [
        "index.html",
        "images/foo.jpg",
    ],
)

should be

$ tar tvf ../bazel-bin/test_no_package_base.tar
-r-xr-xr-x  0 0      0           7 Dec 31  1999 index.html
-r-xr-xr-x  0 0      0           6 Dec 31  1999 images/foo.jpg

but is

tar tvf ../bazel-bin/test_no_package_base.tar
drwxr-xr-x  0 0      0           0 Dec 31  1999 ./
-r-xr-xr-x  0 0      0           7 Dec 31  1999 ./index.html
-r-xr-xr-x  0 0      0           6 Dec 31  1999 ./foo.jpg

So there are 2 things wrong. The images folder is being flattened away in addition to the wrongly added '/'

@aiuto aiuto reopened this Mar 3, 2021
@aiuto aiuto removed the help wanted label Mar 3, 2021
@aiuto aiuto added P1 An issue that must be resolved. Must have an assignee breaking change bug and removed P2 An issue that should be worked on when time is available labels Mar 3, 2021
@nacl
Copy link
Collaborator

nacl commented Dec 8, 2021

So, how does this fit in with the pkg_filegroup framework? To me, it seems like we should go with the breaking behavior: ./ prefix should need to be explicitly provided to pkg_files and/or pkg_filegroup for it to appear, with the understanding that some packaging rules may ignore it.

Following from this, does this mean we should change the behavior of package_base? Also another breaking change I'd be happy to look at when I get some time at the end of the month.

@nacl
Copy link
Collaborator

nacl commented Dec 17, 2021

Hacked on this a little bit. Some observations:

  • /-anchored tar files and archive.py do not get along well. Fortunately, they don't seem to be particularly well-supported, or needed. I'll see if my use case requires them, but we might just be able to get away with not supporting it at all.
  • In fact, the whole tar archive writer in archive.py probably should be rewritten. A particularly nice solution would be to have something that schedules the creation of all paths prior to actually assembling the tarfile. This would also have the nice property of being extensible to things beyond just tarfiles. It might require, among other things , Package JSON manifest should support generic attributes #385.
  • The package_base problem was relatively easy to fix and test. I can have a PR made for this soonish.

The question still remains: do we do with pkg_filegroups and this? This is a path manipulation, so, strictly speaking, paths should be defined in the incoming pkg_filegroups. This then would require users to do something like:

pkg_filegroup(
    name = "reroot_tar_inputs",
    srcs = [":all_the_srcs"],
    prefix = "./",    
)

Or prefix everything with ./ themselves in pkg_files and friends. I don't see a big problem requiring this -- it keeps the path interface clean.

Another option would be to put this rooting stuff back in pkg_tar even with pkg_filegroups. It would be convenient at the price of cluttering the interface a bit. I still vote for keeping everything in pkg_files and friends due to the minimal boilerplate cost. Users can easily macro their way around it if needed.

Remaining questions:

  • Do we want to support rooted tarfiles, or any rooted package?
  • What do we do about the pkg_filegroup problem, above?

@aiuto aiuto self-assigned this Feb 7, 2022
@aiuto aiuto added this to the 1.0 milestone Feb 12, 2022
@aiuto
Copy link
Collaborator

aiuto commented Feb 13, 2022

I'm working on this now.

The least surprising solution that I am going for is

  • "./" is not prepended to paths.
  • pkg_tar can add a directory prefix, which will get prepended.

This way, you can have pkg_* rules do whatever they want to make a path, or you can use legacy constructions and add a path at file writing time.

There is some layered stupidity in archive.py, but I am note ready to remove it yet. I think the debian packaging needs it. My patfh forward is to remove the mistake of insisting on a leading './' in archive in one PR, then remove the identical mistake at the pkg_tar level.

aiuto added a commit to aiuto/rules_pkg that referenced this issue Feb 13, 2022
- clean up tests appropriately
- remove achive.add_dir because we do not use it

This is a precursor change to removing the unwanted leading ./ from tar writers.
See bazelbuild#50, bazelbuild#531
aiuto added a commit that referenced this issue Feb 14, 2022
* Stop archive writer from always adding './' as a root path.

- clean up tests appropriately
- remove achive.add_dir because we do not use it

This is a precursor change to removing the unwanted leading ./ from tar writers.
See #50, #531
@aiuto
Copy link
Collaborator

aiuto commented Mar 3, 2022

May be fixed by #554, except I'm not putting much work into the ability to set the package prefix to "./".
I have yet to see a good reason why that is ever a desired behavior - other than backwards compatibility to broken behavior.

aiuto added a commit to aiuto/rules_pkg that referenced this issue Apr 5, 2022
We still never create a lone '.' as a directory in the archive. That is
intentional

There are a few parts to this:

- Remove the root_directory logic from tar_writer. It was a needless feature
  inherited from an ancient implementation. It was not used by anything except
  the tests.
- Stop re-normalizing paths in tar_writer. It should do what it is asked.  The
  implication is that all the test which accounted for test data which which
  contained tar files with paths like './a', will now keep the './'.
- Add a test to show prefix_dir == './' does work.

Fixes bazelbuild#50
aiuto added a commit to aiuto/rules_pkg that referenced this issue Apr 5, 2022
We still never create a lone '.' as a directory in the archive. That is
intentional

There are a few parts to this:

- Remove the root_directory logic from tar_writer. It was a needless feature
  inherited from an ancient implementation. It was not used by anything except
  the tests.
- Stop re-normalizing paths in tar_writer. It should do what it is asked.  The
  implication is that all the test which accounted for test data which which
  contained tar files with paths like './a', will now keep the './'.
- Add a test to show prefix_dir == './' does work.

RELNOTES: pkg_tar no longer prefixes paths with './'.

Fixes bazelbuild#50
@aiuto aiuto closed this as completed in #570 Apr 6, 2022
aiuto added a commit that referenced this issue Apr 6, 2022
We still never create a lone '.' as a directory in the archive. That is
intentional

There are a few parts to this:

- Remove the root_directory logic from tar_writer. It was a needless feature
  inherited from an ancient implementation. It was not used by anything except
  the tests.
- Stop re-normalizing paths in tar_writer. It should do what it is asked.  The
  implication is that all the test which accounted for test data which which
  contained tar files with paths like './a', will now keep the './'.
- Add a test to show prefix_dir == './' does work.

RELNOTES: pkg_tar no longer prefixes paths with './'. You can use `package_dir` with an explicit "./" to get the old behavior.

Fixes #50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change bug P1 An issue that must be resolved. Must have an assignee
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants