Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4.6] libct/cg/sd: reconnect and retry on dbus connection error #7

Closed
wants to merge 9 commits into from

Commits on May 1, 2021

  1. libct/cgroups/systemd: eliminate runc/systemd race

    In case it takes more than 1 second for systemd to create a unit,
    startUnit() times out with a warning and then runc proceeds
    (to create cgroups using fs manager and so on).
    
    Now runc and systemd are racing, and multiple scenarios are possible.
    
    In one such scenario, by the time runc calls systemd manager's Apply()
    the unit is not yet created, the dbusConnection.SetUnitProperties()
    call fails with "unit xxx.scope not found", and the whole container
    start also fails.
    
    To eliminate the race, we need to return an error in case the timeout is
    hit.
    
    To reduce the chance to fail, increase the timeout from 1 to 30 seconds,
    to not error out too early on a busy/slow system (and times like 3-5
    seconds are not unrealistic).
    
    While at it, as the timeout is quite long now, make sure to not leave
    a stray timer.
    
    Signed-off-by: Kir Kolyshkin <[email protected]>
    (cherry picked from commit 3844789)
    Signed-off-by: Kir Kolyshkin <[email protected]>
    kolyshkin committed May 1, 2021
    Configuration menu
    Copy the full SHA
    7235c92 View commit details
    Browse the repository at this point in the history
  2. libct/cg/sd/systemdVersion: don't return error

    As the caller of this function just logs the error, it does not make
    sense to pass it. Instead, log it (once) and return -1.
    
    This is a preparation for the second user.
    
    Signed-off-by: Kir Kolyshkin <[email protected]>
    (cherry picked from commit eee425f)
    
    [Minor merge conflict due to missing "return" removed by a hunk from
    commit 978fa6e]
    
    Signed-off-by: Kir Kolyshkin <[email protected]>
    kolyshkin committed May 1, 2021
    Configuration menu
    Copy the full SHA
    6951756 View commit details
    Browse the repository at this point in the history
  3. libct/cg/sd: add dbus manager

    [@kolyshkin: documentation nits]
    
    Signed-off-by: Shiming Zhang <[email protected]>
    Signed-off-by: Kir Kolyshkin <[email protected]>
    (cherry picked from commit cdbed6f)
    
    [minor merge conflict due to missing upstream commit 73f22e7]
    
    Signed-off-by: Kir Kolyshkin <[email protected]>
    wzshiming authored and kolyshkin committed May 1, 2021
    Configuration menu
    Copy the full SHA
    7c6eb9d View commit details
    Browse the repository at this point in the history
  4. libct/cg/sd: add isDbusError

    Generalize isUnitExists as isDbusError, and use errors.As while at it
    (which can handle wrapped errors as well).
    
    Signed-off-by: Kir Kolyshkin <[email protected]>
    (cherry picked from commit bacfc2c)
    Signed-off-by: Kir Kolyshkin <[email protected]>
    kolyshkin committed May 1, 2021
    Configuration menu
    Copy the full SHA
    629e2cf View commit details
    Browse the repository at this point in the history
  5. libct/cg/sd: add renew dbus connection

    [@kolyshkin: doc nits, use dbus.ErrClosed and isDbusError]
    
    Signed-off-by: Shiming Zhang <[email protected]>
    Signed-off-by: Kir Kolyshkin <[email protected]>
    (cherry picked from commit 15fee98)
    Signed-off-by: Kir Kolyshkin <[email protected]>
    wzshiming authored and kolyshkin committed May 1, 2021
    Configuration menu
    Copy the full SHA
    57d2ef8 View commit details
    Browse the repository at this point in the history
  6. Privatize NewUserSystemDbus

    Signed-off-by: Shiming Zhang <[email protected]>
    Signed-off-by: Kir Kolyshkin <[email protected]>
    (cherry picked from commit 6122bc8)
    Signed-off-by: Kir Kolyshkin <[email protected]>
    wzshiming authored and kolyshkin committed May 1, 2021
    Configuration menu
    Copy the full SHA
    809ecfa View commit details
    Browse the repository at this point in the history
  7. libct/cg/sd: retry on dbus disconnect

    Instead of reconnecting to dbus after some failed operations, and
    returning an error (so a caller has to retry), reconnect AND retry
    in place for all such operations.
    
    This should fix issues caused by a stale dbus connection after e.g.
    a dbus daemon restart.
    
    Signed-off-by: Kir Kolyshkin <[email protected]>
    (cherry picked from commit 47ef9a1)
    
    [Minor merge conflicts due to missing upstream commits
    52390d6 and af521ed.]
    Signed-off-by: Kir Kolyshkin <[email protected]>
    kolyshkin committed May 1, 2021
    Configuration menu
    Copy the full SHA
    bda2c9a View commit details
    Browse the repository at this point in the history

Commits on May 21, 2021

  1. libct/cg/sd: introduce and use getManagerProperty

    Commit 47ef9a1 forgot to wrap GetManagerProperty("ControlGroup")
    into retryOnDisconnect. Since there's one other user of
    GetManagerProperty, add getManagerProperty wrapper and use it.
    
    Fixes: 47ef9a1
    
    Signed-off-by: Kir Kolyshkin <[email protected]>
    (cherry picked from commit 99c5c50)
    Signed-off-by: Kir Kolyshkin <[email protected]>
    kolyshkin committed May 21, 2021
    Configuration menu
    Copy the full SHA
    9c9761f View commit details
    Browse the repository at this point in the history
  2. libct/cg/sd: use global dbus connection

    Using per cgroup manager dbus connection instances means
    that every cgroup manager instance gets a new connection,
    and those connections are never closed, ultimately resulting
    in file descriptors limit being hit.
    
    Revert back to using a single global dbus connection for everything,
    without changing the callers.
    
    NOTE that it is assumed a runtime can't use both root and rootless
    dbus at the same time. If this happens, we panic.
    
    Signed-off-by: Kir Kolyshkin <[email protected]>
    (cherry picked from commit c7f847e)
    Signed-off-by: Kir Kolyshkin <[email protected]>
    kolyshkin committed May 21, 2021
    Configuration menu
    Copy the full SHA
    482b488 View commit details
    Browse the repository at this point in the history