Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local state db not kept consistent on client quit ("zombie" file if client interrupted) #1738

Closed
moscicki opened this issue May 6, 2014 · 6 comments

Comments

@moscicki
Copy link
Contributor

moscicki commented May 6, 2014

If you quit the client while it is uploading the file the local state db for that file is not updated. This has wierd side effect on a client restart. For example, if you delete this file locally (or a directory containing it) a "zombie" file will be downloaded from the server (a file will reappear magically back to life on your local folder).

See logs below.

Steps to reproduce:

for i in {01..99}; do
echo "this is test $i" >> NEW_DIR/test-$i.txt;
done

Quit the client while uploading. Delete NEW_DIR altogether. Restart the client.

For 1.5.3 the last file will become a "zombie".

For 1.6.0-beta2 (today's git master actually) the whole lot of files will come back to life. So the local state DB consistency on exit in this version is even more of a problem.

If I read the schema correctly, the uploadinfo table could be used to mark the beginning of the file transfer, and then the entry being removed after the file upload was successful and an entry in metadata table created. There is still a small chance that the file is successfully uploaded but the client process gets killed before you create the entry in the metadata table but this should probably provide you already with enough state information to react properly on the next client restart (propagate deletion if the local file has been deleted and has an entry in uploadinfo table).

Here are the logs - client quit while uploading test-15.txt:

05-06 16:27:52:568 About to upload  "NEW_DIR/test-14.txt"   ( "" 16  bytes ) 
05-06 16:27:52:568    hbf_transfer PUT request to /remote.php/webdav/NEW_DIR/test-14.txt 
05-06 16:27:52:568    _hbf_dav_request Block: 0 , Start: 0 and Size: 16 
05-06 16:27:53:303 void Mirall::FolderWatcherPrivate::slotReceivedNotification(int) .csync_journal.db-journal 
05-06 16:27:53:303 ignore journal 
05-06 16:27:53:303 "INSERT OR REPLACE INTO metadata (phash, pathlen, path, inode, uid, gid, mode, modtime, type, md5, fileid) VALUES ( ? , ?, ? , ? , ? , ? , ?,  ? , ? , ?, ? )" -1964102096051110385 19 "NEW_DIR/test-14.txt" 7345193 0 0 0 "1399386459" "0" "5368f1692126b" "" 
05-06 16:27:53:304 "DELETE FROM uploadinfo WHERE path=?" "NEW_DIR/test-14.txt" 
05-06 16:27:53:304 Transaction Start  "upload file start" 
05-06 16:27:53:358 void Mirall::FolderWatcherPrivate::slotReceivedNotification(int) .csync_journal.db-journal 
05-06 16:27:53:358 ignore journal 
05-06 16:27:53:358 void Mirall::CSyncThread::transferCompleted(const Mirall::SyncFileItem&) "NEW_DIR/test-14.txt" 4 "" 
05-06 16:27:53:358 void Mirall::FolderWatcherPrivate::slotReceivedNotification(int) .csync_journal.db-journal 
05-06 16:27:53:359 ignore journal 
05-06 16:27:53:359 ** PUT request to /remote.php/webdav/NEW_DIR/test-15.txt 
05-06 16:27:53:359    hbf_splitlist block_size: 10485760 threshold: 10485760 st_size: 16 
05-06 16:27:53:359    hbf_splitlist num_blocks: 1 rmainder: 16 blk_size: 10485760 
05-06 16:27:53:359    hbf_splitlist created block 0   (start: 0  size: 16) 
05-06 16:27:53:359 About to upload  "NEW_DIR/test-15.txt"   ( "" 16  bytes ) 
05-06 16:27:53:359    hbf_transfer PUT request to /remote.php/webdav/NEW_DIR/test-15.txt 
05-06 16:27:53:359    _hbf_dav_request Block: 0 , Start: 0 and Size: 16 
05-06 16:27:53:632 Saving  0  unknown certs. 
05-06 16:27:53:648 SocketApi:  dtor 

The localstate db:

sqlite> select * from metadata where path like "%test-14.txt%";
-1964102096051110385|19|NEW_DIR/test-14.txt|7345193|0|0|0|1399386459|0|5368f1692126b|
sqlite> select * from metadata where path like "%test-15.txt%";
sqlite>
@ogoffart
Copy link
Contributor

ogoffart commented May 7, 2014

What I believe happenned:
The client uploads files to the server (sometimes in parallel), but did not get the reply from the server that the file was properly uploaded. We can't save it to the database without answer from the server, because we still don't have the etag.

Next sync, the files are on the server. They are seen as new because we don't have their etag in the database. So we have to download them back.

We could detect that an upload was finished but not yet writen to the database, but then we would still have th eproblem that without the etag, we don't know if the file has been changed on the server or not. And we don't want to just delete those files in doubt.

I would not say it is a database inconsistency.

I think we can't fix this bug and don't want to do it.
It is equivalent to the two general problem: http://en.wikipedia.org/wiki/Two_Generals'_Problem
In case of failure, we don't know if the file was on the server or not.

@moscicki
Copy link
Contributor Author

moscicki commented May 7, 2014

I just checked again the headers which are sent in this test scenario.

The server DOES provide an etag in the PUT response header (I see it for the files which are uploaded). It appears that when I "Quit" with the button you don't wait for the termination of ongoing requests to be able to save the etag. Hence subsequently you don' know if the transfer completed or not.

I would certainly not expect such a behaviour with a normal "Quit". I would normally expect that a user may wait a little bit to finish the ongoing transfers or force quit.

I could possibly agree that if I killed the process externally one may expect some glitches. Even then I think it should be possible to find out if the file made it to the server or not an a subsequent restart (just stat/propfind it -- the server should behave in a transactional way - so it would either completely put the file or not at all if the data transfer was interrupted).

Of course, in doubt, better not to delete files, but here at least for the normal "Quit" we may just avoid the problem altogether if we allow the transfers to terminate.

For 1.6 the problem is more pronounced because it concerns ALL files uploaded in parallel.

What do you think?

@dragotin
Copy link
Contributor

Note to self: For small files, the uploadinfo table is not used. Only chunks are written to the uploadinfo table.

@ogoffart
Copy link
Contributor

when I "Quit" with the button you don't wait for the termination of ongoing requests

Right, quit cancels any ongoing transfers. It is true we could wait a couple of seconds. for upload for an answer from the server.

@dragotin
Copy link
Contributor

Yes, so on quit we will wait for either having all currently running jobs finished or for a couple of seconds, and than quit the client.

But I'll move that to a later milestone as it is a too large change for 1.6.0, compared to the problems it can cause.

@ogoffart
Copy link
Contributor

ogoffart commented Jul 7, 2014

We will not fix this issue:

  • The step to reproduce the problem are quite uncommon: interupting the sync and removing a just created directory.
  • The bug is not so critical: no data loss, and easy to recover from

Solving this bug on the other hand may cause data loss in some other corner cases.

@ogoffart ogoffart closed this as completed Jul 7, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants