Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract: Proper modes with multiple copies #73

Closed
wants to merge 5 commits into from

Commits on Oct 12, 2023

  1. slicer/test: Support for custom packages

    Currently we only test with the embedded base-files files package. This
    quite limits what we can test. But since commit a3315e3
    ("testutil/pkgdata: Create deb programatically") we have the ability to
    define our own custom package content.
    
    Add support for testing with custom packages. These can be defined in
    each test case per archive name. The content can be defined directly in
    test case definition with the help of testutil.MustMakeDeb().
    
    The package content is wrapped in the testPackage structure. This
    introduces one level of nesting in each test case package definition. In
    future, we will add package metadata to the package definition, namely
    version. So wrapping the package content in the structure now and later
    just adding a new field to it the structure later will make the future
    diff much smaller.
    woky committed Oct 12, 2023
    Configuration menu
    Copy the full SHA
    0d5d78c View commit details
    Browse the repository at this point in the history
  2. archive: Add Info() method

    At some point we will need to get information about (deb) packages from
    (Go) packages other than archive, namely from the slicer package.
    For instance:
    - Package entries in Chisel DB will include version, digest and
      architecture.
    - Multi-archive package selection will need to know versions of a
      package in all configured archives.
    
    Add Info() method to the Archive interface that returns a new
    PackageInfo interface that contains accessors for the underlying
    control.Section of the package. Currently these accessors are Name(),
    Version(), Arch() and SHA256() methods.
    
    The reason to introduce PackageInfo interface instead of returing
    control.Section directly is keep the archive abstraction and not leak
    implementation details.
    woky committed Oct 12, 2023
    Configuration menu
    Copy the full SHA
    19c9af6 View commit details
    Browse the repository at this point in the history

Commits on Oct 20, 2023

  1. fsutil: Add SlashedPathDir() and SlashedPathClean() functions

    From the code:
    
      These functions exists because we work with slash terminated
      directory paths that come from deb package tarballs but standard
      library path functions treat slash terminated paths as unclean.
    
    We use ad-hoc code to make filepath.Dir() slash terminated in two places
    right now. For example this code
    
      targetDir := filepath.Dir(strings.TrimRight(targetPath, "/")) + "/"
    
    is not strictly correct as "/a/b/.///" will be turned into "/a/b/./"
    which is still "/a/b".
    
    Since we deal with slash terminated directory paths everywhere, create
    and use dedicated helper functions to process them.
    woky committed Oct 20, 2023
    Configuration menu
    Copy the full SHA
    7febfeb View commit details
    Browse the repository at this point in the history
  2. extract: Proper parent directory modes

    The purpose of parent directory handling in deb/extract.go is to create
    parent directories of requested paths with the same attributes (only
    mode as of now) that appear in the package tarball. However, the current
    implementation is not correct when glob paths are requested.
    
    In what follows parent directory refers to a directory path that is not
    explicitly requested for extraction, but that is the parent of other
    paths that are requested for extraction, and so it is assumed to be
    implicitly requested for extraction.
    
    Currently, whether a package path should be extracted is determined by
    the shouldExtract() function that iterates over requested paths and for
    each checks whether it matches the package path if it's glob, or if it's
    non-glob, whether it equals the package path or whether some of its
    target paths have the package path as the parent.
    
    There are two problems with this implementation:
    
      1) It only checks whether a package path is the parent of any target
         path of a requested non-glob path. It does not, and probably even
         cannot, check whether it is the parent of a requested glob path.
    
      2) It iterates over the map of requested paths for every package path,
         even though for requested non-glob paths, it can match by directory
         lookup. And in each iteration, it checks whether a requested path
         is a glob by searching for wildcards in it.
    
    This commit addresses mainly the first problem, but it touches the
    second one as well.
    
    Track modes of directories as we encounter them in the tarball. Then,
    when creating a target path, create its missing parent directories with
    modes with which they were recorded in the tarball, or 0755 if they were
    not recorded yet. In the latter case, the directory mode is also tracked
    so that the directory can be recreated with the proper mode if we find
    it in the tarball later. This algorithm works for both glob and non-glob
    requested paths.
    
    Since the matching that was previously done in the shouldExtract()
    function is not used anymore, the function was removed. As part of this
    change, the requested glob paths are recorded before the extraction
    begins into the dedicated list which is scanned only when requested
    non-glob path lookup fails.
    
    We still match requested non-glob and glob paths separately. Ideally, we
    would use some kind of pattern matching on trees, something like a radix
    tree which also supports wildcards, but this commit is humble.
    
    One consequence of this change is that when an optional path that
    doesn't exist in the tarball is requested, its parent directories are
    not created (if they are not the parents of other requested paths). But
    this new behavior is arguably more natural than the old behavior where
    we created parent directories for non-existent paths, which seems to
    have been just an artifact of the implementation. Therefore, one test
    had to be changed for this behavior change.
    
    Since we do not allow optional paths to be defined in slices, this
    change is visible only to callers of the deb.Extract() function. In
    chisel, these callers are extract test and slicer. The latter depended
    on the old behavior to create parent directories for non-extracted
    content paths by including their direct parent directories in the
    requested paths. The behavior of chisel is preserved by changing slicer
    to include all, and not just direct, parent directories of non-extracted
    content paths in the requested paths.
    woky committed Oct 20, 2023
    Configuration menu
    Copy the full SHA
    7569606 View commit details
    Browse the repository at this point in the history
  3. extract: Proper modes with multiple copies

    We use the source tar entry header to compute fs.FileMode. When the
    target path has a different mode, we modify the source header first.
    Consequently, when a source path is mapped to multiple target paths, and
    when one target path overrides the mode and the following one doesn't,
    the following one will have the first one's mode. Fix it by remembering
    the source header mode.
    woky committed Oct 20, 2023
    Configuration menu
    Copy the full SHA
    1424faa View commit details
    Browse the repository at this point in the history