Skip to content

Commit

Permalink
Fix for #1491 (error user callback not called sometimes) and #489 (ki…
Browse files Browse the repository at this point in the history
…lling rsync not handled gracefully / rsync exit code not evaluated) (#1502)

Fix bug: Taking a snapshot reports `rsync` errors now even if no snapshot was taken (#1491)
Fix bug: takeSnapshot() recognizes errors now by also evaluating the rsync exit code (#489)
         Fixes related problem: Killing `rsync` was not handled gracefully (by ignoring the rsync exit code)
Fix bug: The error user-callback is now always called if an error happened during taking a snapshot (#1491)
Feature: Introduce new error codes for the "error" user callback (as part of #1491):
             5: Error while taking a snapshot.
             6: New snapshot taken but with errors.
Improvement: The `rsync` exit code is now contained in the snapshot log. Example:
             [E] Error: 'rsync' ended with exit code -9 (negative values are signal numbers, see 'kill -l')
Fix CHANGES entries (stick to our standards)

---------

Co-authored-by: aryoda <[email protected]>
  • Loading branch information
aryoda and aryoda authored Aug 17, 2023
1 parent f9434b7 commit 976081b
Show file tree
Hide file tree
Showing 4 changed files with 157 additions and 55 deletions.
37 changes: 23 additions & 14 deletions CHANGES
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,15 @@ Version 1.3.4-dev (development of upcoming release)
* Fix bug: Unit test fails on some machines due to warning "Ignoring XDG_SESSION_TYPE=wayland on Gnome..." (#1429)
* Fix bug: Generation of config-manpage caused an error with Debian's Lintian (#1398).
* Fix bug: Return empty list in smartRemove (#1392, Debian Bug Report 973760)
* Fix bug: Taking a snapshot reports `rsync` errors now even if no snapshot was taken (#1491)
* Fix bug: takeSnapshot() recognizes errors now by also evaluating the rsync exit code (#489)
Fixes related problem: Killing `rsync` was not handled gracefully (by ignoring the rsync exit code)
* Fix bug: The error user-callback is now always called if an error happened during taking a snapshot (#1491)
* Feature: Introduce new error codes for the "error" user callback (as part of #1491):
5: Error while taking a snapshot.
6: New snapshot taken but with errors.
* Feature: The `rsync` exit code is now contained in the snapshot log (part of #489). Example:
[E] Error: 'rsync' ended with exit code -9 (negative values are signal numbers, see 'kill -l')
* Breaking change: Minimal Python version 3.8 required (#1358).
* Removed: Handling and checking of user group "fuse" (#1472).
* Feature: Exclude /swapfile by default (#1053)
Expand All @@ -26,23 +35,23 @@ Version 1.3.4-dev (development of upcoming release)
* Added: Translation in Persian and Vietnamese (#1460).

Version 1.3.3 (2023-01-04)
* New feature: Command line argument "--diagnostics" to show helpful info for better issue support (#1100)
* Feature: New command line argument "--diagnostics" to show helpful info for better issue support (#1100)
* GUI change: Remove Exit button from the toolbar (#172)
* GUI change: Define accelerator keys for menu bar and tabs, as well as toolbar shortcuts (#1104)
* Desktop integration: Update .desktop file to mark Back In Time as a single main window program (#1258)
* Improvement: Write all log output to stderr; do not pollute stdout with INFO and WARNING messages anymore (#1337)
* Bugfix: RTE "reentrant call inside io.BufferedWriter" in logFile.flush() during backup (#1003)
* Bugfix: Incompatibility with rsync 3.2.4 or later because of rsync's "new argument protection" (#1247). Deactivate "--old-args" rsync argument earlier recommaned to users as a workaround.
* Bugfix: DeprecationWarnings about invalid escape sequences.
* Bugfix: AttributeError in "Diff Options" dialog (#898)
* Bugfix: Settings GUI: "Save password to Keyring" was disabled due to "no appropriate keyring found" (#1321)
* Bugfix: Back in Time did not start with D-Bus error
"dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NameHasNoOwner:
Could not get owner of name 'net.launchpad.backintime.serviceHelper': no such name"
(fixes client-side part of #921 - system D-Bus part of the Udev serviceHelper is still under investigation).
* Bugfix: Avoid logging errors while waiting for a target drive to be mounted (#1142, #1143, #1328)
* Bugfix: [Arch Linux] AUR pkg "backintime-git": Build tests fails and installation is aborted (#1233, fixed with #921)
* Bugfix: Wrong systray icon showing in Wayland (#1244)
* Feature: Write all log output to stderr; do not pollute stdout with INFO and WARNING messages anymore (#1337)
* Fix bug: RTE "reentrant call inside io.BufferedWriter" in logFile.flush() during backup (#1003)
* Fix bug: Incompatibility with rsync 3.2.4 or later because of rsync's "new argument protection" (#1247). Deactivate "--old-args" rsync argument earlier recommaned to users as a workaround.
* Fix bug: DeprecationWarnings about invalid escape sequences.
* Fix bug: AttributeError in "Diff Options" dialog (#898)
* Fix bug: Settings GUI: "Save password to Keyring" was disabled due to "no appropriate keyring found" (#1321)
* Fix bug: Back in Time did not start with D-Bus error
"dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NameHasNoOwner:
Could not get owner of name 'net.launchpad.backintime.serviceHelper': no such name"
(fixes client-side part of #921 - system D-Bus part of the Udev serviceHelper is still under investigation).
* Fix bug: Avoid logging errors while waiting for a target drive to be mounted (#1142, #1143, #1328)
* Fix bug: [Arch Linux] AUR pkg "backintime-git": Build tests fails and installation is aborted (#1233, fixed with #921)
* Fix bug: Wrong systray icon showing in Wayland (#1244)
* Documentation update: Correct description of profile<N>.schedule.time in backintime-config manpage (#1270)
* Translation update: Brazilian Portuguese (#1267)
* Translation update: Italian (#1110, #1123)
Expand Down
13 changes: 9 additions & 4 deletions common/pluginmanager.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,14 +91,19 @@ def error(self, code, message):
Known error codes:
1. No or no valid configuration
1: No or no valid configuration
(check the configuration file)
2. A backup process is already running.
2: A backup process is already running.
Make sure that automatic and manual backups
do not run at once.
3. Snapshots directory not found
3: Snapshots directory not found
(eg. when a removable drive is not mounted)
4. The requested snapshot for "now" already exists
4: The requested snapshot for "now" already exists.
``message`` contains the SID (snapshot ID) then.
5: Error while taking a snapshot.
``message`` contains more information (as string).
6: New snapshot taken but with errors.
``message`` contains the SID (snapshot ID) then.
message: The error message for the code
(mostly an empty string by default)
Expand Down
103 changes: 79 additions & 24 deletions common/snapshots.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,24 @@ def takeSnapshotMessage(self):

#TODO: make own class for takeSnapshotMessage
def setTakeSnapshotMessage(self, type_id, message, timeout = -1):
"""Update the status message of the active snapshot creation job
Write the status message into a message file to allow async
processing for the GUI, plug-ins (like user-callback)
and desktop notifications.
Args:
type_id: Simplified severity level of the status message:
1: ERROR
other values: INFO
message: status message string
timeout: Requested maximum processing duration in plug-ins.
Default: -1 (no limit)
Called plug-ins must try to keep the timeout itself
(it is not enforced by the ``PluginManager``.
In fact currently all known plug-ins do
ignore the timeout value!
"""
data = str(type_id) + '\n' + message

try:
Expand Down Expand Up @@ -680,7 +698,7 @@ def backup(self, force = False):

if not self.config.isConfigured():
logger.warning('Not configured', self)
self.config.PLUGIN_MANAGER.error(1) #not configured
self.config.PLUGIN_MANAGER.error(1) # not configured
elif not force and self.config.noSnapshotOnBattery() and tools.onBattery():
self.setTakeSnapshotMessage(0, _('Deferring backup while on battery'))
logger.info('Deferring backup while on battery', self)
Expand All @@ -696,7 +714,7 @@ def backup(self, force = False):
if not instance.check():
logger.warning('A backup is already running. The pid of the \
already running backup is in file %s. Maybe delete it' % instance.pidFile , self )
self.config.PLUGIN_MANAGER.error(2) #a backup is already running
self.config.PLUGIN_MANAGER.error(2) # a backup is already running
elif not restore_instance.check():
logger.warning('Restore is still running. Stop backup until \
restore is done. The pid of the already running restore is in %s. Maybe delete it'\
Expand All @@ -711,10 +729,10 @@ def backup(self, force = False):

now = datetime.datetime.today()

#inhibit suspend/hibernate during snapshot is running
# inhibit suspend/hibernate during snapshot is running
self.config.inhibitCookie = tools.inhibitSuspend(toplevel_xid = self.config.xWindowId)

#mount
# mount
try:
hash_id = mount.Mount(cfg = self.config).mount()
except MountException as ex:
Expand All @@ -733,7 +751,7 @@ def backup(self, force = False):
elif not self.config.PLUGIN_MANAGER.processBegin():
logger.info('A plugin prevented the backup', self)
else:
#take snapshot process begin
# take snapshot process begin
self.setTakeSnapshotMessage(0, '...')
self.snapshotLog.new(now)
profile_id = self.config.currentProfile()
Expand All @@ -757,19 +775,19 @@ def backup(self, force = False):
if counter != 0:
logger.info("Waited %d seconds for target directory to be available", counter)


if not self.config.canBackup(profile_id):
logger.warning('Can\'t find snapshots folder!', self)
self.config.PLUGIN_MANAGER.error(3) #Can't find snapshots directory (is it on a removable drive ?)
self.config.PLUGIN_MANAGER.error(3) # Can't find snapshots directory (is it on a removable drive ?)
else:
ret_error = False
sid = SID(now, self.config)

if sid.exists():
logger.warning("Snapshot path \"%s\" already exists" %sid.path(), self)
self.config.PLUGIN_MANAGER.error(4, sid) #This snapshots already exists
self.config.PLUGIN_MANAGER.error(4, sid) # This snapshot already exists
else:
try:
# TODO rename ret_val to new_snapshot_created and ret_error to has_error for clearer code
ret_val, ret_error = self.takeSnapshot(sid, now, include_folders)
except:
new = NewSnapshot(self.config)
Expand All @@ -783,12 +801,20 @@ def backup(self, force = False):

if ret_error:
logger.error('Failed to take snapshot.', self)
self.setTakeSnapshotMessage(1, _('Failed to take snapshot {snapshot_id}.').format(snapshot_id=sid.displayID))
msg = _('Failed to take snapshot {snapshot_id}.').format(snapshot_id=sid.displayID)
self.setTakeSnapshotMessage(1, msg)
self.config.PLUGIN_MANAGER.error(5, msg) # Fixes #1491

time.sleep(2)
else:
logger.warning("No new snapshot", self)
else:
ret_error = False
else: # new snapshot taken...
if ret_error:
logger.error('New snapshot taken but errors detected', self)
self.config.PLUGIN_MANAGER.error(6, sid.displayID) # Fixes #1491
ret_error = False # Why ignore errors now?
# Probably because a new snapshot has been created (= changes transfered)
# and "continue on errors" is enabled

if not ret_error:
self.freeSpace(now)
Expand All @@ -809,7 +835,7 @@ def backup(self, force = False):
if not ret_error:
self.clearTakeSnapshotMessage()

#unmount
# unmount
try:
mount.Mount(cfg = self.config).umount(self.config.current_hash_id)
except MountException as ex:
Expand Down Expand Up @@ -866,9 +892,11 @@ def rsyncCallback(self, line, params):
Args:
line (str): stdout line from rsync
params (list): list of two bool '[error, changes]'. Using siteefect
on changing list items will change original
list, too. If rsync reported an error ``params[0]``
params (list): list of two bool '[error, changes]'.
Uses a side effect by changing list items here
to change the original list of the caller, too
(lists are passed as reference in Python).
If rsync reported an error ``params[0]``
will be set to ``True``. If rsync reported a changed
file ``params[1]`` will be set to ``True``
"""
Expand All @@ -878,13 +906,17 @@ def rsyncCallback(self, line, params):
self.setTakeSnapshotMessage(
0, _('Take snapshot') + " (rsync: %s)" % line)

# Did rsync report an error?
if line.endswith(')'):
if line.startswith('rsync:'):
if not line.startswith('rsync: chgrp ') and not line.startswith('rsync: chown '):
# matches rsync error lines like:
# rsync: [generator] link [...] failed: Invalid cross-device link (18)
params[0] = True
self.setTakeSnapshotMessage(1, 'Error: ' + line)

if len(line) >= 13:
# The prefix is created by rsync via the argument "--out-format=BACKINTIME: %i %n%L"
if line.startswith('BACKINTIME: '):
if line[12] != '.' and line[12:14] != 'cd':
params[1] = True
Expand Down Expand Up @@ -1099,7 +1131,10 @@ def takeSnapshot(self, sid, now, include_folders):

new_snapshot = NewSnapshot(self.config)
encode = self.config.ENCODE
params = [False, False] # [error, changes]
params = [False, False] # [error, changes] # "return" values set during async rsync execution (as user data "by ref")
# TODO docstring of return value for this function swaps the meaning of the elements,
# this is confusing (``ret_val``, ``ret_error``) and error-prone.
# Use a mutable data structure with named elements instead, e.g. a DataClass

if new_snapshot.exists() and new_snapshot.saveToContinue:
logger.info("Found leftover '%s' which can be continued." %new_snapshot.displayID, self)
Expand Down Expand Up @@ -1158,6 +1193,11 @@ def takeSnapshot(self, sid, now, include_folders):
# It should delete the excluded folders then
rsync_prefix.extend(('--delete', '--delete-excluded'))
rsync_prefix.append('-v')
# Use a fixed logging format for the rsync "changed files" list to make it parsable e.g. in rsyncCallback()
# %i = itemized list (11 characters) of what is being updated (see "--itemize-changes" in "man rsync")
# %n = the filename (short form; trailing "/" on dir)
# %L = the string " -> SYMLINK", " => HARDLINK", or "" (where SYMLINK or HARDLINK is a filename)
# (see log format section in "man rsyncd.conf")
rsync_prefix.extend(('-i', '--out-format=BACKINTIME: %i %n%L'))
if prev_sid:
link_dest = encode.path(os.path.join(prev_sid.sid, 'backup'))
Expand All @@ -1176,12 +1216,18 @@ def takeSnapshot(self, sid, now, include_folders):

# run rsync
proc = tools.Execute(cmd,
callback = self.rsyncCallback,
callback = self.rsyncCallback, # TODO interprets the user_data in params as: list of two bool [error, changes] but params is reused as return value of this function with [changes, error]. Use a separate variable to avoid confusion!
user_data = params,
filters = (self.filterRsyncProgress,),
parent = self)
self.snapshotLog.append('[I] ' + proc.printable_cmd, 3)
proc.run()
self.snapshotLog.append('[I] ' + proc.printable_cmd, 3) # TODO introduce centralized log msg builder to avoid spread severity level indicators like "[I]" here?
# TODO Process return value with rsync exit code to recognize errors that cannot be recognized by parsing the rsync output currently
rsync_exit_code = proc.run()
# Fix for #1491 and #489
# Note that the return value (containing the exit code) of the rsync child process
# is not the only way to detect errors (and sometimes not reliably delivers <> 0 in case of an error):
# Errors are also indicated via the pass-by-ref argument user_data="params" list
# (updated by the callback function that parses the rsync output for error message patterns).

# cleanup
try:
Expand All @@ -1192,9 +1238,16 @@ def takeSnapshot(self, sid, now, include_folders):
self)

# handle errors
has_errors = False
has_errors = False # TODO Fix inconsistent usage: Collects return value,
# but errors are also checked via params[0]

if rsync_exit_code != 0: # indicates an error
params[0] = True # HACK to fix #489 (params[0] and has_errors should be merged)
self.setTakeSnapshotMessage(1,
_("Error: 'rsync' ended with exit code {exit_code} (negative values are signal numbers, see 'kill -l')".format(
exit_code=rsync_exit_code)))

# params[0] -> error
# params[0] -> error?
if params[0]:
if not self.config.continueOnErrors():
self.remove(new_snapshot)
Expand All @@ -1203,7 +1256,7 @@ def takeSnapshot(self, sid, now, include_folders):
has_errors = True
new_snapshot.failed = True

# params[1] -> changes
# params[1] -> changes?
if not params[1] and not self.config.takeSnapshotRegardlessOfChanges():
self.remove(new_snapshot)
logger.info("Nothing changed, no new snapshot necessary", self)
Expand All @@ -1212,7 +1265,8 @@ def takeSnapshot(self, sid, now, include_folders):
prev_sid.setLastChecked()
if not has_errors and not list(self.config.anacrontabFiles()):
tools.writeTimeStamp(self.config.anacronSpoolFile())
return [False, False]
return [False, has_errors] # Part of fix for #1491: Returns "has_errors" instead of False now
# to signal rsync errors (which may have prevented processing any changes)

self.backupConfig(new_snapshot)
self.backupPermissions(new_snapshot)
Expand All @@ -1226,6 +1280,7 @@ def takeSnapshot(self, sid, now, include_folders):
logger.debug('Failed to write takeSnapshot log %s into compressed file %s: %s'
%(self.config.takeSnapshotLogFile(), new_snapshot.path(SID.LOG), str(e)),
self)
# TODO How is this error handled? Currently it looks like it is ignored (just logged)!

new_snapshot.saveToContinue = False

Expand All @@ -1248,7 +1303,7 @@ def takeSnapshot(self, sid, now, include_folders):
if not has_errors and not list(self.config.anacrontabFiles()):
tools.writeTimeStamp(self.config.anacronSpoolFile())

#create last_snapshot symlink
# create last_snapshot symlink
self.createLastSnapshotSymlink(sid)

return [True, has_errors]
Expand Down
Loading

0 comments on commit 976081b

Please sign in to comment.