Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from torvalds:master #48

Merged
merged 51 commits into from
Jun 9, 2020
Merged

[pull] master from torvalds:master #48

merged 51 commits into from
Jun 9, 2020

Commits on Apr 20, 2020

  1. virtiofs: schedule blocking async replies in separate worker

    In virtiofs (unlike in regular fuse) processing of async replies is
    serialized.  This can result in a deadlock in rare corner cases when
    there's a circular dependency between the completion of two or more async
    replies.
    
    Such a deadlock can be reproduced with xfstests:generic/503 if TEST_DIR ==
    SCRATCH_MNT (which is a misconfiguration):
    
     - Process A is waiting for page lock in worker thread context and blocked
       (virtio_fs_requests_done_work()).
     - Process B is holding page lock and waiting for pending writes to
       finish (fuse_wait_on_page_writeback()).
     - Write requests are waiting in virtqueue and can't complete because
       worker thread is blocked on page lock (process A).
    
    Fix this by creating a unique work_struct for each async reply that can
    block (O_DIRECT read).
    
    Fixes: a62a8ef ("virtio-fs: add virtiofs filesystem")
    Signed-off-by: Vivek Goyal <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    rhvgoyal authored and Miklos Szeredi committed Apr 20, 2020
    Configuration menu
    Copy the full SHA
    bb737bb View commit details
    Browse the repository at this point in the history
  2. virtiofs: Add mount option and atime behavior to the doc

    Add a section to show the mount option and a subsection to show
    the atime behavior.
    
    Signed-off-by: Masayoshi Mizuma <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    mmizuma authored and Miklos Szeredi committed Apr 20, 2020
    Configuration menu
    Copy the full SHA
    a5d8422 View commit details
    Browse the repository at this point in the history
  3. fuse: BUG_ON correction in fuse_dev_splice_write()

    commit 9635453 ("fuse: reduce allocation size for splice_write")
    changed size of bufs array, so BUG_ON which checks the index of the array
    shold also be fixed.
    
    [SzM: turn BUG_ON into WARN_ON]
    
    Fixes: 9635453 ("fuse: reduce allocation size for splice_write")
    Signed-off-by: Vasily Averin <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    vaverin authored and Miklos Szeredi committed Apr 20, 2020
    Configuration menu
    Copy the full SHA
    0e9fb6f View commit details
    Browse the repository at this point in the history
  4. fuse: Update stale comment in queue_interrupt()

    Fixes: 04ec5af "fuse: export fuse_end_request()"
    Signed-off-by: Kirill Tkhai <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    Kirill Tkhai authored and Miklos Szeredi committed Apr 20, 2020
    Configuration menu
    Copy the full SHA
    75d8925 View commit details
    Browse the repository at this point in the history

Commits on May 13, 2020

  1. ovl: resolve more conflicting mount options

    Similar to the way that a conflict between metacopy=on,redirect_dir=off is
    resolved, also resolve conflicts between nfs_export=on,index=off and
    nfs_export=on,metacopy=on.
    
    An explicit mount option wins over a default config value.  Both explicit
    mount options result in an error.
    
    Without this change the xfstests group overlay/exportfs are skipped if
    metacopy is enabled by default.
    
    Reported-by: Chengguang Xu <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    amir73il authored and Miklos Szeredi committed May 13, 2020
    Configuration menu
    Copy the full SHA
    b0def88 View commit details
    Browse the repository at this point in the history
  2. ovl: cleanup non-empty directories in ovl_indexdir_cleanup()

    Teach ovl_indexdir_cleanup() to remove temp directories containing
    whiteouts to prepare for using index dir instead of work dir for removing
    merge directories.
    
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    amir73il authored and Miklos Szeredi committed May 13, 2020
    Configuration menu
    Copy the full SHA
    3011645 View commit details
    Browse the repository at this point in the history
  3. ovl: prepare to copy up without workdir

    With index=on, we copy up lower hardlinks to work dir and move them into
    index dir. Fix locking to allow work dir and index dir to be the same
    directory.
    
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    amir73il authored and Miklos Szeredi committed May 13, 2020
    Configuration menu
    Copy the full SHA
    773cb4c View commit details
    Browse the repository at this point in the history
  4. ovl: index dir act as work dir

    With index=on, let index dir act as the work dir for copy up and cleanups.
    This will help implementing whiteout inode sharing.
    
    We still create the "work" dir on mount regardless of index=on and it is
    used to test the features supported by upper fs.  One reason is that before
    the feature tests, we do not know if index could be enabled or not.
    
    The reason we do not use "index" directory also as workdir with index=off
    is because the existence of the "index" directory acts as a simple
    persistent signal that index was enabled on this filesystem and tools may
    want to use that signal.
    
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    amir73il authored and Miklos Szeredi committed May 13, 2020
    Configuration menu
    Copy the full SHA
    62a8a85 View commit details
    Browse the repository at this point in the history
  5. ovl: skip overlayfs superblocks at global sync

    Stacked filesystems like overlayfs has no own writeback, but they have to
    forward syncfs() requests to backend for keeping data integrity.
    
    During global sync() each overlayfs instance calls method ->sync_fs() for
    backend although it itself is in global list of superblocks too.  As a
    result one syscall sync() could write one superblock several times and send
    multiple disk barriers.
    
    This patch adds flag SB_I_SKIP_SYNC into sb->sb_iflags to avoid that.
    
    Reported-by: Dmitry Monakhov <[email protected]>
    Signed-off-by: Konstantin Khlebnikov <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    koct9i authored and Miklos Szeredi committed May 13, 2020
    Configuration menu
    Copy the full SHA
    32b1924 View commit details
    Browse the repository at this point in the history
  6. ovl: inherit SB_NOSEC flag from upperdir

    Since the stacking of regular file operations [1], the overlayfs edition of
    write_iter() is called when writing regular files.
    
    Since then, xattr lookup is needed on every write since file_remove_privs()
    is called from ovl_write_iter(), which would become the performance
    bottleneck when writing small chunks of data. In my test case,
    file_remove_privs() would consume ~15% CPU when running fstime of unixbench
    (the workload is repeadly writing 1 KB to the same file) [2].
    
    Inherit the SB_NOSEC flag from upperdir. Since then xattr lookup would be
    done only once on the first write. Unixbench fstime gets a ~20% performance
    gain with this patch.
    
    [1] https://lore.kernel.org/lkml/20180606150905.GC9426@magnolia/T/
    [2] https://www.spinics.net/lists/linux-unionfs/msg07153.html
    
    Signed-off-by: Jeffle Xu <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    lostjeffle authored and Miklos Szeredi committed May 13, 2020
    Configuration menu
    Copy the full SHA
    654255f View commit details
    Browse the repository at this point in the history
  7. ovl: whiteout inode sharing

    Share inode with different whiteout files for saving inode and speeding up
    delete operation.
    
    If EMLINK is encountered when linking a shared whiteout, create a new one.
    In case of any other error, disable sharing for this super block.
    
    Note: ofs->whiteout is protected by inode lock on workdir.
    
    Signed-off-by: Chengguang Xu <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    cgxu519 authored and Miklos Szeredi committed May 13, 2020
    Configuration menu
    Copy the full SHA
    c21c839 View commit details
    Browse the repository at this point in the history
  8. ovl: sync dirty data when remounting to ro mode

    sync_filesystem() does not sync dirty data for readonly filesystem during
    umount, so before changing to readonly filesystem we should sync dirty data
    for data integrity.
    
    Signed-off-by: Chengguang Xu <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    cgxu519 authored and Miklos Szeredi committed May 13, 2020
    Configuration menu
    Copy the full SHA
    399c109 View commit details
    Browse the repository at this point in the history
  9. ovl: return required buffer size for file handles

    Overlayfs doesn't work well with the fanotify mechanism.
    
    Fanotify first probes for the required buffer size for the file handle,
    but overlayfs currently bails out without passing the size back.
    
    That results in errors in the kernel log, such as:
    
    [527944.485384] overlayfs: failed to encode file handle (/, err=-75, buflen=0, len=29, type=1)
    [527944.485386] fanotify: failed to encode fid (fsid=ae521e68.a434d95f, type=255, bytes=0, err=-2)
    
    Signed-off-by: Lubos Dolezel <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    LubosD authored and Miklos Szeredi committed May 13, 2020
    Configuration menu
    Copy the full SHA
    144da23 View commit details
    Browse the repository at this point in the history

Commits on May 19, 2020

  1. fuse: invalidate inode attr in writeback cache mode

    Under writeback mode, inode->i_blocks is not updated, making utils du
    read st.blocks as 0.
    
    For example, when using virtiofs (cache=always & nondax mode) with
    writeback_cache enabled, writing a new file and check its disk usage
    with du, du reports 0 usage.
    
      # uname -r
      5.6.0-rc6+
      # mount -t virtiofs virtiofs /mnt/virtiofs
      # rm -f /mnt/virtiofs/testfile
    
      # create new file and do extend write
      # xfs_io -fc "pwrite 0 4k" /mnt/virtiofs/testfile
      wrote 4096/4096 bytes at offset 0
      4 KiB, 1 ops; 0.0001 sec (28.103 MiB/sec and 7194.2446 ops/sec)
      # du -k /mnt/virtiofs/testfile
      0               <==== disk usage is 0
      # stat -c %s,%b /mnt/virtiofs/testfile
      4096,0          <==== i_size is correct, but st_blocks is 0
    
    Fix it by invalidating attr in fuse_flush(), so we get up-to-date attr
    from server on next getattr.
    
    Signed-off-by: Eryu Guan <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    eryugey authored and Miklos Szeredi committed May 19, 2020
    Configuration menu
    Copy the full SHA
    cf576c5 View commit details
    Browse the repository at this point in the history
  2. fuse: always flush dirty data on close(2)

    We want cached data to synced with the userspace filesystem on close(), for
    example to allow getting correct st_blocks value.  Do this regardless of
    whether the userspace filesystem implements a FLUSH method or not.
    
    Signed-off-by: Miklos Szeredi <[email protected]>
    Miklos Szeredi committed May 19, 2020
    Configuration menu
    Copy the full SHA
    614c026 View commit details
    Browse the repository at this point in the history
  3. fuse: always allow query of st_dev

    Fuse mounts without "allow_other" are off-limits to all non-owners.  Yet it
    makes sense to allow querying st_dev on the root, since this value is
    provided by the kernel, not the userspace filesystem.
    
    Allow statx(2) with a zero request mask to succeed on a fuse mounts for all
    users.
    
    Reported-by: Nikolaus Rath <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    Miklos Szeredi committed May 19, 2020
    Configuration menu
    Copy the full SHA
    5157da2 View commit details
    Browse the repository at this point in the history
  4. virtiofs: do not use fuse_fill_super_common() for device installation

    fuse_fill_super_common() allocates and installs one fuse_device.  Hence
    virtiofs allocates and install all fuse devices by itself except one.
    
    This makes logic little twisted.  There does not seem to be any real need
    that why virtiofs can't allocate and install all fuse devices itself.
    
    So opt out of fuse device allocation and installation while calling
    fuse_fill_super_common().
    
    Regular fuse still wants fuse_fill_super_common() to install fuse_device.
    It needs to prevent against races where two mounters are trying to mount
    fuse using same fd.  In that case one will succeed while other will get
    -EINVAL.
    
    virtiofs does not have this issue because sget_fc() resolves the race
    w.r.t multiple mounters and only one instance of virtio_fs_fill_super()
    should be in progress for same filesystem.
    
    Signed-off-by: Vivek Goyal <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    rhvgoyal authored and Miklos Szeredi committed May 19, 2020
    Configuration menu
    Copy the full SHA
    7fd3abf View commit details
    Browse the repository at this point in the history
  5. fuse: use dump_page

    Instead of custom page dumping, use the standard helper.
    
    Reported-by: Matthew Wilcox <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    Miklos Szeredi committed May 19, 2020
    Configuration menu
    Copy the full SHA
    0058938 View commit details
    Browse the repository at this point in the history
  6. fuse: fix weird page warning

    When PageWaiters was added, updating this check was missed.
    
    Reported-by: Nikolaus Rath <[email protected]>
    Reported-by: Hugh Dickins <[email protected]>
    Fixes: 6290602 ("mm: add PageWaiters indicating tasks are waiting for a page bit")
    Signed-off-by: Miklos Szeredi <[email protected]>
    Miklos Szeredi committed May 19, 2020
    Configuration menu
    Copy the full SHA
    a5005c3 View commit details
    Browse the repository at this point in the history
  7. fuse: don't check refcount after stealing page

    page_count() is unstable.  Unless there has been an RCU grace period
    between when the page was removed from the page cache and now, a
    speculative reference may exist from the page cache.
    
    Reported-by: Matthew Wilcox <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    Miklos Szeredi committed May 19, 2020
    Configuration menu
    Copy the full SHA
    32f9887 View commit details
    Browse the repository at this point in the history
  8. fuse: update attr_version counter on fuse_notify_inval_inode()

    A GETATTR request can race with FUSE_NOTIFY_INVAL_INODE, resulting in the
    attribute cache being updated with stale information after the
    invalidation.
    
    Fix this by bumping the attribute version in fuse_reverse_inval_inode().
    
    Reported-by: Krzysztof Rusek <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    Miklos Szeredi committed May 19, 2020
    Configuration menu
    Copy the full SHA
    5ddd9ce View commit details
    Browse the repository at this point in the history
  9. fuse: optimize writepages search

    Re-work fi->writepages, replacing list with rb-tree.  This improves
    performance because kernel fuse iterates through fi->writepages for each
    writeback page and typical number of entries is about 800 (for 100MB of
    fuse writeback).
    
    Before patch:
    
    10240+0 records in
    10240+0 records out
    10737418240 bytes (11 GB) copied, 41.3473 s, 260 MB/s
    
     2  1      0 57445400  40416 6323676    0    0    33 374743 8633 19210  1  8 88  3  0
    
      29.86%  [kernel]               [k] _raw_spin_lock
      26.62%  [fuse]                 [k] fuse_page_is_writeback
    
    After patch:
    
    10240+0 records in
    10240+0 records out
    10737418240 bytes (11 GB) copied, 21.4954 s, 500 MB/s
    
     2  9      0 53676040  31744 10265984    0    0    64 854790 10956 48387  1  6 88  6  0
    
      23.55%  [kernel]             [k] copy_user_enhanced_fast_string
       9.87%  [kernel]             [k] __memcpy
       3.10%  [kernel]             [k] _raw_spin_lock
    
    Signed-off-by: Maxim Patlasov <[email protected]>
    Signed-off-by: Vasily Averin <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    Maxim Patlasov authored and Miklos Szeredi committed May 19, 2020
    Configuration menu
    Copy the full SHA
    6b2fb79 View commit details
    Browse the repository at this point in the history

Commits on May 20, 2020

  1. fuse: fix copy_file_range cache issues

    a) Dirty cache needs to be written back not just in the writeback_cache
    case, since the dirty pages may come from memory maps.
    
    b) The fuse_writeback_range() helper takes an inclusive interval, so the
    end position needs to be pos+len-1 instead of pos+len.
    
    Fixes: 88bc7d5 ("fuse: add support for copy_file_range()")
    Signed-off-by: Miklos Szeredi <[email protected]>
    Miklos Szeredi committed May 20, 2020
    Configuration menu
    Copy the full SHA
    2c4656d View commit details
    Browse the repository at this point in the history
  2. fuse: copy_file_range should truncate cache

    After the copy operation completes the cache is not up-to-date.  Truncate
    all pages in the interval that has successfully been copied.
    
    Truncating completely copied dirty pages is okay, since the data has been
    overwritten anyway.  Truncating partially copied dirty pages is not okay;
    add a comment for now.
    
    Fixes: 88bc7d5 ("fuse: add support for copy_file_range()")
    Signed-off-by: Miklos Szeredi <[email protected]>
    Miklos Szeredi committed May 20, 2020
    Configuration menu
    Copy the full SHA
    9b46418 View commit details
    Browse the repository at this point in the history

Commits on Jun 2, 2020

  1. ovl: fix out of bounds access warning in ovl_check_fb_len()

    syzbot reported out of bounds memory access from open_by_handle_at()
    with a crafted file handle that looks like this:
    
      { .handle_bytes = 2, .handle_type = OVL_FILEID_V1 }
    
    handle_bytes gets rounded down to 0 and we end up calling:
      ovl_check_fh_len(fh, 0) => ovl_check_fb_len(fh + 3, -3)
    
    But fh buffer is only 2 bytes long, so accessing struct ovl_fb at
    fh + 3 is illegal.
    
    Fixes: cbe7fba ("ovl: make sure that real fid is 32bit aligned in memory")
    Reported-and-tested-by: [email protected]
    Cc: <[email protected]> # v5.5
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    amir73il authored and Miklos Szeredi committed Jun 2, 2020
    Configuration menu
    Copy the full SHA
    522f6e6 View commit details
    Browse the repository at this point in the history
  2. ovl: simplify setting of origin for index lookup

    overlayfs can keep index of copied up files and directories and it seems to
    serve two primary puroposes.  For regular files, it avoids breaking lower
    hardlinks over copy up.  For directories it seems to be used for various
    error checks.
    
    During ovl_lookup(), we lookup for index using lower dentry in many a
    cases.  That lower dentry is called "origin" and following is a summary of
    current logic.
    
    If there is no upperdentry, always lookup for index using lower dentry.
    For regular files it helps avoiding breaking hard links over copyup and for
    directories it seems to be just error checks.
    
    If there is an upperdentry, then there are 3 possible cases.
    
     - For directories, lower dentry is found using two ways.  One is regular
      path based lookup in lower layers and second is using ORIGIN xattr on
      upper dentry.  First verify that path based lookup lower dentry matches
      the one pointed by upper ORIGIN xattr.  If yes, use this verified origin
      for index lookup.
    
     - For regular files (non-metacopy), there is no path based lookup in lower
      layers as lookup stops once we find upper dentry.  So there is no origin
      verification.  If there is ORIGIN xattr present on upper, use that to
      lookup index otherwise don't.
    
     - For regular metacopy files, again lower dentry is found using path based
      lookup as well as ORIGIN xattr on upper.  Path based lookup is continued
      in this case to find lower data dentry for metacopy upper.  So like
      directories we only use verified origin.  If ORIGIN xattr is not present
      (Either because lower did not support file handles or because this is
      hardlink copied up with index=off), then don't use path lookup based
      lower dentry as origin.  This is same as regular non-metacopy file case.
    
    Suggested-by: Amir Goldstein <[email protected]>
    Signed-off-by: Vivek Goyal <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    rhvgoyal authored and Miklos Szeredi committed Jun 2, 2020
    Configuration menu
    Copy the full SHA
    59fb201 View commit details
    Browse the repository at this point in the history
  3. ovl: use only uppermetacopy state in ovl_lookup()

    Currently we use a variable "metacopy" which signifies that dentry could be
    either uppermetacopy or lowermetacopy.  Amir suggested that we can move
    code around and use d.metacopy in such a way that we don't need
    lowermetacopy and just can do away with uppermetacopy.
    
    So this patch replaces "metacopy" with "uppermetacopy".
    
    It also moves some code little higher to keep reading little simpler.
    
    Suggested-by: Amir Goldstein <[email protected]>
    Signed-off-by: Vivek Goyal <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    rhvgoyal authored and Miklos Szeredi committed Jun 2, 2020
    Configuration menu
    Copy the full SHA
    6815f47 View commit details
    Browse the repository at this point in the history
  4. ovl: initialize OVL_UPPERDATA in ovl_lookup()

    Currently ovl_get_inode() initializes OVL_UPPERDATA flag and for that it
    has to call ovl_check_metacopy_xattr() and check if metacopy xattr is
    present or not.
    
    yangerkun reported sometimes underlying filesystem might return -EIO and in
    that case error handling path does not cleanup properly leading to various
    warnings.
    
    Run generic/461 with ext4 upper/lower layer sometimes may trigger the bug
    as below(linux 4.19):
    
    [  551.001349] overlayfs: failed to get metacopy (-5)
    [  551.003464] overlayfs: failed to get inode (-5)
    [  551.004243] overlayfs: cleanup of 'd44/fd51' failed (-5)
    [  551.004941] overlayfs: failed to get origin (-5)
    [  551.005199] ------------[ cut here ]------------
    [  551.006697] WARNING: CPU: 3 PID: 24674 at fs/inode.c:1528 iput+0x33b/0x400
    ...
    [  551.027219] Call Trace:
    [  551.027623]  ovl_create_object+0x13f/0x170
    [  551.028268]  ovl_create+0x27/0x30
    [  551.028799]  path_openat+0x1a35/0x1ea0
    [  551.029377]  do_filp_open+0xad/0x160
    [  551.029944]  ? vfs_writev+0xe9/0x170
    [  551.030499]  ? page_counter_try_charge+0x77/0x120
    [  551.031245]  ? __alloc_fd+0x160/0x2a0
    [  551.031832]  ? do_sys_open+0x189/0x340
    [  551.032417]  ? get_unused_fd_flags+0x34/0x40
    [  551.033081]  do_sys_open+0x189/0x340
    [  551.033632]  __x64_sys_creat+0x24/0x30
    [  551.034219]  do_syscall_64+0xd5/0x430
    [  551.034800]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
    
    One solution is to improve error handling and call iget_failed() if error
    is encountered.  Amir thinks that this path is little intricate and there
    is not real need to check and initialize OVL_UPPERDATA in ovl_get_inode().
    Instead caller of ovl_get_inode() can initialize this state.  And this will
    avoid double checking of metacopy xattr lookup in ovl_lookup() and
    ovl_get_inode().
    
    OVL_UPPERDATA is inode flag.  So I was little concerned that initializing
    it outside ovl_get_inode() might have some races.  But this is one way
    transition.  That is once a file has been fully copied up, it can't go back
    to metacopy file again.  And that seems to help avoid races.  So as of now
    I can't see any races w.r.t OVL_UPPERDATA being set wrongly.  So move
    settingof OVL_UPPERDATA inside the callers of ovl_get_inode().
    ovl_obtain_alias() already does it.  So only two callers now left are
    ovl_lookup() and ovl_instantiate().
    
    Reported-by: yangerkun <[email protected]>
    Suggested-by: Amir Goldstein <[email protected]>
    Signed-off-by: Vivek Goyal <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    rhvgoyal authored and Miklos Szeredi committed Jun 2, 2020
    Configuration menu
    Copy the full SHA
    28166ab View commit details
    Browse the repository at this point in the history
  5. ovl: fix redirect traversal on metacopy dentries

    Amir pointed me to metacopy test cases in unionmount-testsuite and I
    decided to run "./run --ov=10 --meta" and it failed while running test
    "rename-mass-5.py".
    
    Problem is w.r.t absolute redirect traversal on intermediate metacopy
    dentry.  We do not store intermediate metacopy dentries and also skip
    current loop/layer and move onto lookup in next layer.  But at the end of
    loop, we have logic to reset "poe" and layer index if currnently looked up
    dentry has absolute redirect.  We skip all that and that means lookup in
    next layer will fail.
    
    Following is simple test case to reproduce this.
    
    - mkdir -p lower upper work merged lower/a lower/b
    - touch lower/a/foo.txt
    - mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work,metacopy=on none merged
    
    # Following will create absolute redirect "/a/foo.txt" on upper/b/bar.txt.
    - mv merged/a/foo.txt merged/b/bar.txt
    
    # unmount overlay and use upper as lower layer (lower2) for next mount.
    - umount merged
    - mv upper lower2
    - rm -rf work; mkdir -p upper work
    - mount -t overlay -o lowerdir=lower2:lower,upperdir=upper,workdir=work,metacopy=on none merged
    
    # Force a metacopy copy-up
    - chown bin:bin merged/b/bar.txt
    
    # unmount overlay and use upper as lower layer (lower3) for next mount.
    - umount merged
    - mv upper lower3
    - rm -rf work; mkdir -p upper work
    - mount -t overlay -o lowerdir=lower3:lower2:lower,upperdir=upper,workdir=work,metacopy=on none merged
    
    # ls merged/b/bar.txt
    ls: cannot access 'bar.txt': Input/output error
    
    Intermediate lower layer (lower2) has metacopy dentry b/bar.txt with
    absolute redirect "/a/foo.txt".  We skipped redirect processing at the end
    of loop which sets poe to roe and sets the appropriate next lower layer
    index.  And that means lookup failed in next layer.
    
    Fix this by continuing the loop for any intermediate dentries.  We still do
    not save these at lower stack.  With this fix applied unionmount-testsuite,
    "./run --ov-10 --meta" now passes.
    
    Signed-off-by: Vivek Goyal <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    rhvgoyal authored and Miklos Szeredi committed Jun 2, 2020
    Configuration menu
    Copy the full SHA
    21d8d66 View commit details
    Browse the repository at this point in the history
  6. ovl: pass correct flags for opening real directory

    The three instances of ovl_path_open() in overlayfs/readdir.c do three
    different things:
    
     - pass f_flags from overlay file
     - pass O_RDONLY | O_DIRECTORY
     - pass just O_RDONLY
    
    The value of f_flags can be (other than O_RDONLY):
    
    O_WRONLY	- not possible for a directory
    O_RDWR		- not possible for a directory
    O_CREAT		- masked out by dentry_open()
    O_EXCL		- masked out by dentry_open()
    O_NOCTTY	- masked out by dentry_open()
    O_TRUNC		- masked out by dentry_open()
    O_APPEND	- no effect on directory ops
    O_NDELAY	- no effect on directory ops
    O_NONBLOCK	- no effect on directory ops
    __O_SYNC	- no effect on directory ops
    O_DSYNC		- no effect on directory ops
    FASYNC		- no effect on directory ops
    O_DIRECT	- no effect on directory ops
    O_LARGEFILE	- ?
    O_DIRECTORY	- only affects lookup
    O_NOFOLLOW	- only affects lookup
    O_NOATIME	- overlay sets this unconditionally in ovl_path_open()
    O_CLOEXEC	- only affects fd allocation
    O_PATH		- no effect on directory ops
    __O_TMPFILE	- not possible for a directory
    
    
    Fon non-merge directories we use the underlying filesystem's iterate; in
    this case honor O_LARGEFILE from the original file to make sure that open
    doesn't get rejected.
    
    For merge directories it's safe to pass O_LARGEFILE unconditionally since
    userspace will only see the artificial offsets created by overlayfs.
    
    Signed-off-by: Miklos Szeredi <[email protected]>
    Miklos Szeredi committed Jun 2, 2020
    Configuration menu
    Copy the full SHA
    130fdbc View commit details
    Browse the repository at this point in the history
  7. ovl: switch to mounter creds in readdir

    In preparation for more permission checking, override credentials for
    directory operations on the underlying filesystems.
    
    Signed-off-by: Miklos Szeredi <[email protected]>
    Miklos Szeredi committed Jun 2, 2020
    Configuration menu
    Copy the full SHA
    48bd024 View commit details
    Browse the repository at this point in the history
  8. ovl: verify permissions in ovl_path_open()

    Check permission before opening a real file.
    
    ovl_path_open() is used by readdir and copy-up routines.
    
    ovl_permission() theoretically already checked copy up permissions, but it
    doesn't hurt to re-do these checks during the actual copy-up.
    
    For directory reading ovl_permission() only checks access to topmost
    underlying layer.  Readdir on a merged directory accesses layers below the
    topmost one as well.  Permission wasn't checked for these layers.
    
    Note: modifying ovl_permission() to perform this check would be far more
    complex and hence more bug prone.  The result is less precise permissions
    returned in access(2).  If this turns out to be an issue, we can revisit
    this bug.
    
    Signed-off-by: Miklos Szeredi <[email protected]>
    Miklos Szeredi committed Jun 2, 2020
    Configuration menu
    Copy the full SHA
    56230d9 View commit details
    Browse the repository at this point in the history

Commits on Jun 3, 2020

  1. ovl: call secutiry hook in ovl_real_ioctl()

    Verify LSM permissions for underlying file, since vfs_ioctl() doesn't do
    it.
    
    [Stephen Rothwell] export security_file_ioctl
    
    Signed-off-by: Miklos Szeredi <[email protected]>
    Miklos Szeredi committed Jun 3, 2020
    Configuration menu
    Copy the full SHA
    292f902 View commit details
    Browse the repository at this point in the history
  2. ovl: check permission to open real file

    Call inode_permission() on real inode before opening regular file on one of
    the underlying layers.
    
    In some cases ovl_permission() already checks access to an underlying file,
    but it misses the metacopy case, and possibly other ones as well.
    
    Removing the redundant permission check from ovl_permission() should be
    considered later.
    
    Signed-off-by: Miklos Szeredi <[email protected]>
    Miklos Szeredi committed Jun 3, 2020
    Configuration menu
    Copy the full SHA
    05acefb View commit details
    Browse the repository at this point in the history
  3. ovl: drop negative dentry in upper layer

    Negative dentries of upper layer are useless after construction of
    overlayfs' own dentry and may keep in the memory long time even after
    unmount of overlayfs instance. This patch tries to drop unnecessary
    negative dentry of upper layer to effectively reclaim memory.
    
    Signed-off-by: Chengguang Xu <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    cgxu519 authored and Miklos Szeredi committed Jun 3, 2020
    Configuration menu
    Copy the full SHA
    1434a65 View commit details
    Browse the repository at this point in the history

Commits on Jun 4, 2020

  1. ovl: initialize error in ovl_copy_xattr

    In ovl_copy_xattr, if all the xattrs to be copied are overlayfs private
    xattrs, the copy loop will terminate without assigning anything to the
    error variable, thus returning an uninitialized value.
    
    If ovl_copy_xattr is called from ovl_clear_empty, this uninitialized error
    value is put into a pointer by ERR_PTR(), causing potential invalid memory
    accesses down the line.
    
    This commit initialize error with 0. This is the correct value because when
    there's no xattr to copy, because all xattrs are private, ovl_copy_xattr
    should succeed.
    
    This bug is discovered with the help of INIT_STACK_ALL and clang.
    
    Signed-off-by: Yuxuan Shui <[email protected]>
    Link: https://bugs.chromium.org/p/chromium/issues/detail?id=1050405
    Fixes: 0956254 ("ovl: don't copy up opaqueness")
    Cc: [email protected] # v4.8
    Signed-off-by: Alexander Potapenko <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    yshui authored and Miklos Szeredi committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    520da69 View commit details
    Browse the repository at this point in the history
  2. ovl: add accessor for ofs->upper_mnt

    Next patch will remove ofs->upper_mnt, so add an accessor function for this
    field.
    
    Signed-off-by: Miklos Szeredi <[email protected]>
    Miklos Szeredi committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    08f4c7c View commit details
    Browse the repository at this point in the history
  3. ovl: get rid of redundant members in struct ovl_fs

    ofs->upper_mnt is copied to ->layers[0].mnt and ->layers[0].trap could be
    used instead of a separate ->upperdir_trap.
    
    Split the lowerdir option early to get the number of layers, then allocate
    the ->layers array, and finally fill the upper and lower layers, as before.
    
    Get rid of path_put_init() in ovl_lower_dir(), since the only caller will
    take care of that.
    
    [Colin Ian King] Fix null pointer dereference on null stack pointer on
    error return found by Coverity.
    
    Signed-off-by: Miklos Szeredi <[email protected]>
    Miklos Szeredi committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    b8e42a6 View commit details
    Browse the repository at this point in the history
  4. ovl: make private mounts longterm

    Overlayfs is using clone_private_mount() to create internal mounts for
    underlying layers.  These are used for operations requiring a path, such as
    dentry_open().
    
    Since these private mounts are not in any namespace they are treated as
    short term, "detached" mounts and mntput() involves taking the global
    mount_lock, which can result in serious cacheline pingpong.
    
    Make these private mounts longterm instead, which trade the penalty on
    mntput() for a slightly longer shutdown time due to an added RCU grace
    period when putting these mounts.
    
    Introduce a new helper kern_unmount_many() that can take care of multiple
    longterm mounts with a single RCU grace period.
    
    Cc: Al Viro <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    Miklos Szeredi committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    df820f8 View commit details
    Browse the repository at this point in the history
  5. ovl: only pass ->ki_flags to ovl_iocb_to_rwf()

    Next patch will want to pass a modified set of flags, so...
    
    Signed-off-by: Miklos Szeredi <[email protected]>
    Miklos Szeredi committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    b778e1e View commit details
    Browse the repository at this point in the history
  6. ovl: make oip->index bool

    ovl_get_inode() uses oip->index as a bool value, not as a pointer.
    
    Signed-off-by: Miklos Szeredi <[email protected]>
    Miklos Szeredi committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    74c6e38 View commit details
    Browse the repository at this point in the history

Commits on Jun 8, 2020

  1. ovl: remove unnecessary lock check

    Directory is always locked until "out_unlock" label.  So lock check is not
    needed.
    
    Signed-off-by: youngjun <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>
    youngjun-89 authored and Miklos Szeredi committed Jun 8, 2020
    Configuration menu
    Copy the full SHA
    2068cf7 View commit details
    Browse the repository at this point in the history

Commits on Jun 9, 2020

  1. afs: Fix memory leak in afs_put_sysnames()

    Fix afs_put_sysnames() to actually free the specified afs_sysnames
    object after its reference count has been decreased to zero and
    its contents have been released.
    
    Fixes: 6f8880d ("afs: Implement @sys substitution handling")
    Signed-off-by: Zhihao Cheng <[email protected]>
    Signed-off-by: David Howells <[email protected]>
    Zhihao Cheng authored and dhowells committed Jun 9, 2020
    Configuration menu
    Copy the full SHA
    2ca068b View commit details
    Browse the repository at this point in the history
  2. afs: Fix file locking

    Fix AFS file locking to use the correct vnode pointer and remove a member
    of the afs_operation struct that is never set, but it is read and followed,
    causing an oops.
    
    This can be triggered by:
    
    	flock -s /afs/example.com/foo sleep 1
    
    when it calls the kernel to get a file lock.
    
    Fixes: e49c7b2 ("afs: Build an abstraction around an "operation" concept")
    Reported-by: Dave Botsch <[email protected]>
    Signed-off-by: David Howells <[email protected]>
    Tested-by: Dave Botsch <[email protected]>
    dhowells committed Jun 9, 2020
    Configuration menu
    Copy the full SHA
    5749ce9 View commit details
    Browse the repository at this point in the history
  3. afs: Fix use of BUG()

    Fix afs_compare_addrs() to use WARN_ON(1) instead of BUG() and return 1
    (ie. srx_a > srx_b).
    
    There's no point trying to put actual error handling in as this should not
    occur unless a new transport address type is allowed by AFS.  And even if
    it does, in this particular case, it'll just never match unknown types of
    addresses.  This BUG() was more of a 'you need to add a case here'
    indicator.
    
    Reported-by: Kees Cook <[email protected]>
    Signed-off-by: David Howells <[email protected]>
    Reviewed-by: Kees Cook <[email protected]>
    dhowells committed Jun 9, 2020
    Configuration menu
    Copy the full SHA
    9ca0652 View commit details
    Browse the repository at this point in the history
  4. afs: Fix debugging statements with %px to be %p

    Fix a couple of %px to be %p in debugging statements.
    
    Fixes: e49c7b2 ("afs: Build an abstraction around an "operation" concept")
    Fixes: 8a070a9 ("afs: Detect cell aliases 1 - Cells with root volumes")
    Reported-by: Kees Cook <[email protected]>
    Signed-off-by: David Howells <[email protected]>
    Reviewed-by: Kees Cook <[email protected]>
    dhowells committed Jun 9, 2020
    Configuration menu
    Copy the full SHA
    fed79fd View commit details
    Browse the repository at this point in the history
  5. afs: Remove afs_zero_fid as it's not used

    Remove afs_zero_fid as it's not used.
    
    Signed-off-by: David Howells <[email protected]>
    dhowells committed Jun 9, 2020
    Configuration menu
    Copy the full SHA
    4a06fa5 View commit details
    Browse the repository at this point in the history
  6. afs: Make afs_zap_data() static

    Make afs_zap_data() static as it's only used in the file in which it is
    defined.
    
    Signed-off-by: David Howells <[email protected]>
    dhowells committed Jun 9, 2020
    Configuration menu
    Copy the full SHA
    c68421b View commit details
    Browse the repository at this point in the history
  7. Merge tag 'afs-fixes-20200609' of git://git.kernel.org/pub/scm/linux/…

    …kernel/git/dhowells/linux-fs
    
    Pull AFS fixes from David Howells:
     "A set of small patches to fix some things, most of them minor.
    
       - Fix a memory leak in afs_put_sysnames()
    
       - Fix an oops in AFS file locking
    
       - Fix new use of BUG()
    
       - Fix debugging statements containing %px
    
       - Remove afs_zero_fid as it's unused
    
       - Make afs_zap_data() static"
    
    * tag 'afs-fixes-20200609' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
      afs: Make afs_zap_data() static
      afs: Remove afs_zero_fid as it's not used
      afs: Fix debugging statements with %px to be %p
      afs: Fix use of BUG()
      afs: Fix file locking
      afs: Fix memory leak in afs_put_sysnames()
    torvalds committed Jun 9, 2020
    Configuration menu
    Copy the full SHA
    4964dd2 View commit details
    Browse the repository at this point in the history
  8. Merge tag 'ovl-update-5.8' of git://git.kernel.org/pub/scm/linux/kern…

    …el/git/mszeredi/vfs
    
    Pull overlayfs updates from Miklos Szeredi:
     "Fixes:
    
       - Resolve mount option conflicts consistently
    
       - Sync before remount R/O
    
       - Fix file handle encoding corner cases
    
       - Fix metacopy related issues
    
       - Fix an unintialized return value
    
       - Add missing permission checks for underlying layers
    
      Optimizations:
    
       - Allow multipe whiteouts to share an inode
    
       - Optimize small writes by inheriting SB_NOSEC from upper layer
    
       - Do not call ->syncfs() multiple times for sync(2)
    
       - Do not cache negative lookups on upper layer
    
       - Make private internal mounts longterm"
    
    * tag 'ovl-update-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (27 commits)
      ovl: remove unnecessary lock check
      ovl: make oip->index bool
      ovl: only pass ->ki_flags to ovl_iocb_to_rwf()
      ovl: make private mounts longterm
      ovl: get rid of redundant members in struct ovl_fs
      ovl: add accessor for ofs->upper_mnt
      ovl: initialize error in ovl_copy_xattr
      ovl: drop negative dentry in upper layer
      ovl: check permission to open real file
      ovl: call secutiry hook in ovl_real_ioctl()
      ovl: verify permissions in ovl_path_open()
      ovl: switch to mounter creds in readdir
      ovl: pass correct flags for opening real directory
      ovl: fix redirect traversal on metacopy dentries
      ovl: initialize OVL_UPPERDATA in ovl_lookup()
      ovl: use only uppermetacopy state in ovl_lookup()
      ovl: simplify setting of origin for index lookup
      ovl: fix out of bounds access warning in ovl_check_fb_len()
      ovl: return required buffer size for file handles
      ovl: sync dirty data when remounting to ro mode
      ...
    torvalds committed Jun 9, 2020
    Configuration menu
    Copy the full SHA
    52435c8 View commit details
    Browse the repository at this point in the history
  9. Merge tag 'fuse-update-5.8' of git://git.kernel.org/pub/scm/linux/ker…

    …nel/git/mszeredi/fuse
    
    Pull fuse updates from Miklos Szeredi:
    
     - Fix a rare deadlock in virtiofs
    
     - Fix st_blocks in writeback cache mode
    
     - Fix wrong checks in splice move causing spurious warnings
    
     - Fix a race between a GETATTR request and a FUSE_NOTIFY_INVAL_INODE
       notification
    
     - Use rb-tree instead of linear search for pages currently under
       writeout by userspace
    
     - Fix copy_file_range() inconsistencies
    
    * tag 'fuse-update-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
      fuse: copy_file_range should truncate cache
      fuse: fix copy_file_range cache issues
      fuse: optimize writepages search
      fuse: update attr_version counter on fuse_notify_inval_inode()
      fuse: don't check refcount after stealing page
      fuse: fix weird page warning
      fuse: use dump_page
      virtiofs: do not use fuse_fill_super_common() for device installation
      fuse: always allow query of st_dev
      fuse: always flush dirty data on close(2)
      fuse: invalidate inode attr in writeback cache mode
      fuse: Update stale comment in queue_interrupt()
      fuse: BUG_ON correction in fuse_dev_splice_write()
      virtiofs: Add mount option and atime behavior to the doc
      virtiofs: schedule blocking async replies in separate worker
    torvalds committed Jun 9, 2020
    Configuration menu
    Copy the full SHA
    5b14671 View commit details
    Browse the repository at this point in the history