-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Group requests to ensure balance between bandwidth utilization and bookkeeping #5391 #5390 #4498 #1633 #4454 #5440
Conversation
src/libsync/propagateupload.h
Outdated
/** | ||
* @brief The PropagateUploadBundle class is a container class for upload jobs under chunking size. | ||
* | ||
* It will also ensure proper bandwidth utilization vs bookkeeping balance, and that in case no other items then under chunk uploads are available, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the detailed meaning of:
"and that in case no other items then under chunk uploads are available,"
I made some minor edit suggestions for comments in #5441 |
262e800
to
3701299
Compare
Ok, it works awesome, need to check it in enterprise scale. Remember, if someone want to test it, one needs a server capability / enviromental variable on the client - otherwise it will sync the old way.
@phil-davis Could you rebase and check again! I made a change to be sure that it parallelises correctly for ALL items. It was not previously, so 3 big downloads will block everything if placed in alphabetical order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a detailed review.
From a high level point of view I think this kind of request prioritization is a good idea and while I'm not certain about all details, tweaking the parameters later looks relatively easy.
I'm worrying about all the code duplication in PropagateFiles relative to PropagateDirectory. Is there really a compelling reason for having a second level of job-container, instead of folding this into PropagateDirectory? If there is, maybe a common base class between the two would help reducing the duplication that's going on.
Apart from that, see the detail comments.
src/libsync/capabilities.cpp
Outdated
if (bundling == "0") return false; | ||
if (bundling == "1") return true; | ||
|
||
return _capabilities["dav"].toMap()["bundling"].toByteArray() >= "1.0"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this you get the typical "2.1" > "10.1" problem. Could this just be an integer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just reproduced what @ogoffart did with chunking, nearly copy paste from ChunkingNG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I didn't realize. That doesn't make the comparison any more correct though, even if it does set precedent for having "N.M" versions in the capabilities (which I wasn't aware of). Then I suggest we merge as-is and fix both in a follow-up.
src/libsync/owncloudpropagator.h
Outdated
QScopedPointer<PropagateItemJob> _firstJob; | ||
|
||
// e.g: create class which will handle bundled uploads and bandwidth utilization vs bookkeeping balance | ||
QScopedPointer<PropagatorJob> _filesJob; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this could be typed QScopedPointer<PropagateFiles>
and you could avoid a bunch of casts.
src/libsync/owncloudpropagator.cpp
Outdated
// Ensure that only new files are inserted into PropagateFiles | ||
if (enableBundledRequests && item->_instruction == CSYNC_INSTRUCTION_NEW) { | ||
// Get PropagateFiles container job | ||
PropagateFiles* filesJob = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd write this as
auto& filesJob = directories.top().second->_filesJob;
if (!filesJob) {
filesJob = new PropagateFiles(this);
}
{ | ||
// A small filesize item is a file whose transfer time | ||
// typically will be lower than its bookkeeping time. | ||
static uint smallFileSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer explicit = 0
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, copy from ChunkingNG
src/libsync/owncloudpropagator.cpp
Outdated
if (subJobsIterator.value()->_state == Finished) { | ||
// If this item is finished, remove it from _subJobs as it is not needed anymore | ||
// Note that in this case remove() from QVector will just perform memmove of pointer array items. | ||
PropagatorJob * job = subJobsIterator.value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do PropagatorJob* job = subJobsIterator.next();
at the very beginning to clean up all the subJobsIterator.value()
calls.
} else { | ||
// There are no remaining or pending standard jobs in the whole sync | ||
// This also means that _standardJobs is empty | ||
Q_ASSERT(!scheduleNextJobRoutine(_standardJobs)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert looks dangerous because it can have side effects.
// This also means that _standardJobs is empty | ||
Q_ASSERT(!scheduleNextJobRoutine(_standardJobs)); | ||
|
||
// Parallelise itself into more flows flows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"flows flows"
Q_ASSERT(job); | ||
|
||
// Reduce the global counter of db or standard jobs | ||
if (job->_item->_size <= _propagator->smallFileSize()){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This decision of whether something is standard or db is reiterated in a bunch of places. Definitely make a function for it so the conditions don't go out of sync.
return true; | ||
} else { | ||
// This container does not contain any remaining dbJobs | ||
if(_runningNow > 1){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: A bunch of times in this function spacing is missing: "if(" and "){"
src/libsync/owncloudpropagator.cpp
Outdated
directories.top().second->append(current); | ||
} else { | ||
// Ensure that only new files are inserted into PropagateFiles | ||
if (enableBundledRequests && item->_instruction == CSYNC_INSTRUCTION_NEW) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell this prioritization code has nothing to do with bundling. Why is it gated by the bundling flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is basis for dispatching bundles in the future, this is why. You can call this feature bundling 1.0. i can change a name however. I also give a flag, because this is now cross-folder, some people might not like it, so I give them the chance to sync folder by folder just by changing one capability on server.
@ckamm Because PropagateDirectory is much more complicated thinking about cases to handle, did not want to introduce even more complexity, we will get lost. Also I wanted to make this "pluggable" so you can easily switch it on and off. Mind that PropagateFiles is only for new files without any "special" cases, just normal upload/download, chunked or not. No _first job, delete cases, WaitInDirectory etc. |
fa5eb85
to
0e58dd7
Compare
Hello, Changed a little bit a structure of this and been testing it on enterprise setup, details can be found in this document @ogoffart @guruz @jturcotte @felixboehm, which is part of my CS3 presentation : Features:
|
src/libsync/propagatefiles.cpp
Outdated
finalize(); | ||
return true; | ||
} | ||
_subJobs.reserve(_totalItems); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need to get rid of that, since I am wasting memory here, dont I?
src/libsync/propagatefiles.cpp
Outdated
bool PropagateFiles::scheduleNewJob(QVector<SyncFileItemPtr> &syncJobs){ | ||
// This function is used to schedule new job and lazily create job from sync items | ||
Q_ASSERT(!syncJobs.isEmpty()); | ||
const SyncFileItemPtr &item = syncJobs.takeFirst(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are planning to use only 5.8 dont we? Or should I still keep compatibility with Qt4? (My build is failing because this is used in Qt5)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No the plan is to use 5.6 everywhere we can, and still use Qt4 on linux platforms that we don't ship Qt (unless we manage to bundle Qt). Qt 5.6 is LTS and will be receiving patches until 2019 while 5.8 only until 2018.
0e58dd7
to
83e38c5
Compare
83e38c5
to
b6af59d
Compare
Ok, checked with unit tests and they covered the code, everything is passing, please also mind that https://github.com/owncloud/client/pull/5440/files#diff-7e5082f89a138020f2b1d37fc97d17dbR319 this line has to be adjusted before merging, there is TODO |
@@ -388,6 +395,9 @@ void OwncloudPropagator::start(const SyncFileItemVector& items) | |||
currentDirJob->append(dir); | |||
} | |||
directories.push(qMakePair(item->destination() + "/" , dir)); | |||
} else if (enableScheduledRequests | |||
&& (item->_instruction == CSYNC_INSTRUCTION_NEW || item->_instruction == CSYNC_INSTRUCTION_SYNC)) { | |||
filesJob->append(item); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes one purpose of the PropagateDirectory structure: update the directory's etag in the database only once child files have been synced properly.
I can't wrap my head about the conditions where this could be an issue, but @ogoffart should know if this could cause concrete problems.
See e.g.
client/src/libsync/owncloudpropagator.cpp
Line 705 in 1cec2ca
// For new directories we always want to update the etag once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out, did not know about this, dont think it is difficult to resolve but let @ogoffart see this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrow4a @jturcotte @ogoffart If I'm not mistaken the following is a situation in which that dependency structure could lead to an abort-related problem:
Initial state in db and server:
/ - "etag/1"
/A - "etagA1"
/A/F - "etagF1"
now when someone touches /A/F on the server, the server state becomes
Server state, after touching /A/F
/ - "etag/2"
/A - "etagA2"
/A/F - "etagF2"
but if I understand correctly the propagation dependency graph is
PropagateDirectory (/)
|- PropagateDirectory (/A)
|- PropagateFiles
|- PropagateItem (/A/F)
meaning that /A and /A/F run independently of each other. So it would be possible to completely finish propagating /A before the file transfer /A/F is done. Then aborting the sync run could lead to the db tree
/ - "etag/1"
/A - "etagA2"
/A/F - "etagF1"
Which means a follow up sync would probably not pick up on /A/F being out of date.
Maybe this hints at a second dependency problem: Currently local and remote MkDir are FullParallelism - so isn't there a chance that with this change one could run into a case where the client wants to download or upload into a non-existant directory?
Possibly I'm missing something, feel free to point out incorrectness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it wont upload to non-existent directory, because directory structure / resolving conflicts etc are done before any transfers, and directory deletions are done after https://github.com/owncloud/client/pull/5440/files#diff-c9731f430e8a29b13deaaf73bc4a4e22R56
https://github.com/owncloud/client/pull/5440/files#diff-20b960bb10cf3c0781a80fb3e5775241R493
https://github.com/owncloud/client/pull/5440/files#diff-7e5082f89a138020f2b1d37fc97d17dbR420
About the etag propagation did not look into the problem yet, because we dont have yet smashbox automation for PRs (it is nearly done) and I am busy on the server side before the release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrow4a Okay, I didn't see that. (btw, _runningNow is gone, so this branch doesn't compile - probably was rebased at some point?) In that case the second problem can indeed not happen.
For LAN sync - https://central.owncloud.org/t/gsoc-2017-fast-lan-sync/6271 - we need this PR. Sync clients need to be "on the same page", and be finished with any metadata logic around, so that they can "safely" transfer the data, as in this PR. LAN Sync has to be done only on raw PUT/GETs and this logic has to be abstracted from "Directory sync logic" (basicaly what this PR is doing). @hodyroff @DeepDiver1975 |
@mrow4a: make sure to try to interrupt syncs in the middle and then resume them (also multiple times) and check if this will really still lead to correctly synchronized directories and not to deletes file or files that are not synced. |
I removed the milestone because this would still need more work, and I don't think it is worth it at all. But I leave it open for the sake of discussion. |
Closing outdated pull request |
This fixes issues:
#5391 In the mixed file size scenario bandwidth is very underutilized - enterprise setup
#5390 Broken sync time estimation - enterprise setup
#4498 sync files from smallest first to biggest last
#1633 Propagator: Balance concurrent up/down-load
#4454 Investigate HTTP Pipelining
This includes:
#5400 so if this is merged, the other can be closed
This is basic for:
#5319 Bundling
HTTP2
This can be enhanced by:
#5368 Dynamic Chunking
#5349 Sorting folders by modification time
This requires following capability on server side, otherwise will sync old way:
Idea
The idea behind the implemented algorithm is that in the scenario, in which we have a lot of mixed within the same or different folder - big files, under chunking size files of size 100kB or 5MB, files to download, moves, deletes, directory creations etc:
Separate transfer jobs from jobs which server database interaction is major factor
Reserve uncoditionaly 4-6 flows for "db sensitive files" like small file uploads and 2 for upload/download >1MB file transfers
sync normalny if no other files
Cross folder, this means it will look for "db sensitive" and "data transfer" in all folders.
The above will ensure that there is a balance between bandwidth utilization and bookkeeping, what in turn gives faster synchronisation (for details look below)
Used algorithm:
Checkpoints
Details
Lets do some maths. This will be our folder to sync:
100 files - average 100kB file size -> total 10MB to be transfered
10 files - average 10MB file size -> total 100MB to be transfered
Assume, that your network is 5MB/s and one "1 Byte PUT" takes 1s (we are not in ideal home case scenario with empty server now guys, it is not that easy, it could take much much more)
The request "latency" consists of 2 components, time it takes for bookkeeping on server and time it takes for data transfer.
Current case
If you do 100 small request in the row, your data transfer is neglible ~0s and bookkeeping time is 100s. Parallelising that maybe you can achieve 33s having 3 parallel flows.
If you do 10 biger files request in a row, your data transfer is 10s bookkeeping. Parallelised say 4s if you are lucky with 3 parallel flows. However, you cannot omit 20s coming from transfer of 100MB, does not matter how many requests you have in parallel 1000 or 1. Your 5MB/s office net bounds you.
Total you need around 33s plus 24s from big, having 57s.
Optimized case
If you do 100 small request in the row, your data transfer is neglible ~0s and bookkeeping time is 100s. Parallelising that maybe you can achieve 50s reserving 2 flow slots. In this 50s, you used neglible bandwidth. If you use the 3rd flow to pump there 30s comming from big files, you just synced your files in 50s, since your 30s are done in parallel filling the bandwidth.
57s vs 50s is like 13% of the time orginal time? In this example it is around 7s. 100 files is not a big deal but shows bigger picture
More big files and more small files, the different is the percentage. Do the math for 55000 files and 55GB. Where you have 40000 of small files in 10GB(avg. size 250kB) and 15000 in the rest 45GB (avg size 3MB). I think we wont be talking about even minutes there :>
@felixboehm @DeepDiver1975 @davidjericho @cdamken @jnweiger