-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype: dynamic chunkingNG #5368
Conversation
src/libsync/propagateuploadng.cpp
Outdated
// TODO: give link to documentation | ||
if (log>0){ | ||
currentChunkSize = qMin(_lastChunkSize + (qint64) log*chunkSize(), maxChunkSize()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above lines are core algorithm implementation (yes, it is simple). Core thing is correction parameter NaturalLogarithm(change)-1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possitive correction parameter will cause chunk size growth, while negative parameter (signal of limit) will cause drop to defautt chunk size of 10MB. And process starts again.
769cd9e
to
0f74164
Compare
I would love to use the same mechnism for Bundling #5319 |
src/libsync/propagateuploadng.cpp
Outdated
quint64 currentChunkSize = chunkSize(); | ||
|
||
// this will check if getRequestMaxDurationDC is set to 0 or not | ||
double requestMaxDurationDC = (double) getRequestMaxDurationDC(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually checks if capability is turned on or not
src/libsync/propagateuploadng.cpp
Outdated
} | ||
|
||
// prevent situation that chunk size is bigger then required one to send | ||
currentChunkSize = qMin(currentChunkSize, fileSize - _sent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is main algorithm
So this needs a change in the server? |
No, it not even needs the capability from the server, it is enought that you set this 10s - so time to upload 10MB on 1MB/s link on the client. But better if this will be in control of sys admin.. On the server, I just addded capability as in the beggining of the post |
src/libsync/propagateupload.h
Outdated
* Dynamic Chunking attribute the maximum number of miliseconds that single request below chunk size can take | ||
* This value should be based on heuristics with default value 10000ms, time it takes to transfer 10MB chunk on 1MB/s upload link. | ||
* | ||
* Suggested solution will be to evaluate max(SNR, MORD) where: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MORD means MURDER in German :)
src/libsync/propagateuploadng.cpp
Outdated
// this if first chunked file request, so it can start with default size of chunkSize() | ||
// if _lastChunkSize != 0 it means that we already have send one request | ||
if(_lastChunkSize != 0){ | ||
//TODO: this is done step by step for debugging purposes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please leave it "by step". The compiler will anyway optimize. It is very very very important that we also understand this code in some months.
@guruz I think we might need to do that, so the bundle will adjust to the server. Otherwise you will give the same amount of files in bundle both for raspery PI and high performance server and high scale server with high write time. Bundle needs to know how much it can fit for specific server and bandwidth, otherwise request will take ages like with big files on slow network. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! I have left a bunch of detail comments.
src/libsync/propagateuploadng.cpp
Outdated
// If did not exceeded, we will increase the chunk size | ||
// motivation for logarithm is specified in the dynamic chunking documentation | ||
// TODO: give link to documentation | ||
if (log>0){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where's log
assigned? This looks like a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I commit --amend the PR, to make the variables more descriptive and did not notice that, will change it now, this should be correctionParameter
src/libsync/propagateuploadng.cpp
Outdated
// motivation for logarithm is specified in the dynamic chunking documentation | ||
// TODO: give link to documentation | ||
if (log>0){ | ||
currentChunkSize = qMin(_lastChunkSize + (qint64) correctionParameter*chunkSize(), maxChunkSize()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qBound between minChunkSize, newChunkSize and maxChunkSize, otherwise this could be 0 or negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this state possible if we never get lower? We only increase, starting from default chunk size. If measurement negative, rollback to default chunk size.
src/libsync/propagateuploadng.cpp
Outdated
double requestDuration = (double) _stopWatch.addLapTime(QLatin1String("ChunkDuration")) - lastChunkLap; | ||
|
||
// calculate natural logarithm | ||
double correctionParameter = log(requestMaxDurationDC / requestDuration) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the -1. Can you explain again?
To me, this looks odd. log(a/b) -1
is the same as log(a/(b*e))
so it looks like you're dividing requestMaxDurationDC
by e. That means that if requestMaxDuration
is 10s and the requestDuration
is 5s, the correctionParameter
would be negative and you'd be reducing the chunk size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You never reduce the chunking size. You either increase or roll-back to default size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! With the if (correctionParameter > 0)
check this means that you only adjust the chunk size upwards if actual_time < target_time / e
. I guess that's okay.
The current code never adjusts the chunk size downwards, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Please also add your replies that @ckamm had asked as code comment)
src/libsync/propagateupload.h
Outdated
* Dynamic chunking client algorithm is specified in the ownCloud documentation and uses <max_single_upload_request_duration_msec> to estimate if given | ||
* bandwidth allows higher chunk sizes (because of high goodput) | ||
*/ | ||
quint64 _requestMaxDuration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "requestMaxDuration" is a bit misleading, it could be interpreted as requests being aborted it they take too long.
What about "targetRequestDuration" to indicate that this is the duration that the chunk size is calibrated towards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
src/libsync/propagateupload.h
Outdated
* > SNR - Slow network request, so time it will take to transmit default chunking sized request at specific low upload bandwidth | ||
* > MORD - Maximum observed request time, so double the time of maximum observed RTT of the very small PUT request (e.g. 1kB) to the system | ||
* | ||
* Exemplary, syncing 100MB files, with chunking size 10MB, will cause sync of 10 PUT requests which max evaluation was set to <max_single_upload_request_duration_msec> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how this paragraph relates to the previous one. (or what it is an example of)
src/libsync/propagateupload.h
Outdated
* Exemplary, syncing 100MB files, with chunking size 10MB, will cause sync of 10 PUT requests which max evaluation was set to <max_single_upload_request_duration_msec> | ||
* | ||
* Dynamic chunking client algorithm is specified in the ownCloud documentation and uses <max_single_upload_request_duration_msec> to estimate if given | ||
* bandwidth allows higher chunk sizes (because of high goodput) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: throughput
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,http://ethancbanks.com/2015/03/06/what-is-the-difference-between-throughput-goodput/. It is the meaning how you understand that. Will change to throughtput to cause less confusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! I hadn't heard of that term, thanks for the link!
src/libsync/propagateupload.h
Outdated
@@ -290,23 +290,48 @@ class PropagateUploadFileNG : public PropagateUploadFileCommon { | |||
uint _transferId; /// transfer id (part of the url) | |||
int _currentChunk; /// Id of the next chunk that will be sent | |||
bool _removeJobError; /// If not null, there was an error removing the job | |||
quint64 _lastChunkSize; /// current chunk size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start, but it would be nice if this was shared between PropagateUploads, so they don't have to re-learn for each upload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe BandwidthManager could learn the information.
Although the BandwidthManager is more for limiting currently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not fully sure how Bandwidth Manager works, I did not touch it. Of course, we can play with it and test what gives the enhancement easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry about it then. I just meant as a place where you could store a cross-file estimated value fro chunk size
0f74164
to
54b94ff
Compare
@ckamm Please check now. I commit --ammend the PR again, now should be ok. Sorry I had changed variables to be more descriptive and did not notice that it broke the condition check. You can verify how it works just adding the one line to capabilities on the server side. |
Is this state possible if we never get lower? We only increase, starting from default chunk size. If measurement negative, rollback to default chunk size. |
@mrow4a Yep, the qBound is unnecessary if you only ever adjust the chunk size upwards. |
I suggest this simple algorithm:
Note that, (esp. for fast network) the time is bounded by the PHP, and increasing the chunk size will not make the request slower. Meaning we will only converge assymtotically towards the target For this reason, it is important that the target time is much bigger than the time needed for the PHP to proccess. Otherwise we would always converge quite fast to the minimum value, which would be the worst outcome. |
Should we allow this to work with both ChunkingNG and ChunkingV1? |
No, only new chunking. The old chunking cannot easily support dynamic chunk sizes, don't change what works :) |
@mrow4a Is there something specific you need to make progress on this? |
I need time, https://cs3.surfsara.nl/ (I need to do tests and measurements) and exam session is coming :> I think I also did my work hours for this month. |
@mrow4a Understood! All the best with exams! |
src/libsync/propagateuploadng.cpp
Outdated
// and instead move it there gradually. | ||
_propagator->_chunkSize = qBound( | ||
_propagator->minChunkSize(), | ||
(_propagator->_chunkSize + correctedSize) / 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why take the average and not just set correctedSize
?
The comment speaks about multiple chunk upload going on, but i don't see how this is related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both work. My reasoning was this:
The whole thing is heuristic. It's possibly that the code is executed after a single chunk finishes uploading when nothing else was going on, but it could also be that several chunks were competing for bandwidth when we get here. If we wanted to target a specific duration per chunk upload this would not give us the right result even if network throughput was completely constant.
Thus I expect correctedSize
to over and undershoot a good size all the time. Using an exponential moving average like this is a cheap way of smoothing chunk sizes a bit. We expect them to fluctuate around "the best" value.
I could put this reasoning into the code or just remove the averaging, both are fine with me.
(I don't like the fact that there is a dependency between the config file and the sync engine. But since it is already the case, i can't really stop this commit for it. Anyway, it would be nice if it was part of the "SyncOptions" struct instead |
@ogoffart I had to rebase on top of master to make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Merged manually as 53c5f03 |
In current implementation, big files e.g. 100MB are being chunked into smaller pieces called 10MB chunks. The problem here is that this is fixed value, and it is not appropriate on all types of the networks (WiFi versus FiberToTheHome versus ETHERNET LAN). For WiFi it makes sense for small chunks, while for fast networks, it makes sense to have very big chunks - optimal maximum of 50MB.
Server capability
This PR proposes implementation, which in first phase downloads from the server the following capabilities:
The above parameter will be used as a reference in client learning of the available host/server bandwidth and capabilities of the server.
How dynamic chunking works
Lets assume, that the client wants to synchronise 100MB file. Lets assume, that request RTT is 100ms. This means, that it takes around 100ms to send 1st chunk with 1kB of the data. Recalling, the 1st to nth chunked PUTs will not cause any delay in database (as single file PUTs are doing). Final MOVE will cause file to be added to the ownCloud cache.
In the current implementation, the following will happen, taking around 3,6 second:
File will be equally divided into 10 chunks of 10MB, send to assemply stream and MOVEd (with bookkeeping operation) to ownCloud.
In the PRed implementation, the following will happen, taking around 2,5second (30% faster):
What happend, is that 1 PUT carries 10MB, 2nd PUT carries 20MB, 3rd PUT carries 30MB, and 4th PUT carries 40MB.
Algorithm description
Algorithm is based on changes in request time to the reference value, and being corrected using
ln(change)-1
.This results in the following - please mind that this does not includes TCP Congestion Control and in reality growth will be lower:
Three colors are showing the 4 consequitive PUTs for static bandwidth of 10MB/s, chunk sizes 10MB->20MB->25MB->30MB, with slowing growth or reset to 10MB again.
The above graph presents what will be the next values of chunking sizes for the specific static congestion window and specific values of previous values of chunking sizes.
Higher the bandwidth, faster will approach maximum value of 50MB. For lower bandwidths it will not rise at all or rise very slowly.
Higher the chunking size, higher bandwidth needed to cause growth.
You also probably ask yourselves, ok, but this will not work for higher latencies (RTTs).
Yes it works, but until the specific limit of 1s latency. With increasing latency above 1s it will decrease exponentialy. However, this is never the case, since such latencies are rare and are signaling an error. This kind or RTTs are also possible for database bookkeeping operation, however, this is not a case here, since MOVE is a separate operator here, and this one depends on the server.
Why Natural Logarithm?
Natural logarithm has a property that for given given x, when x<1, logarithm is negative, x=1, logarithm is 0, and for x>1, logarithm is possitive.
If your x is ratio of
CurrentRequestDuration / ReferenceRequestDuration
, when CurrentRequestDuration is equal to ReferenceRequestDuration, the next chun size value ''_lastChunkSize + log(CurrentRequestDuration / ReferenceRequestDuration)*chunkSize()" is equal to "_lastChunkSize".As an example, given network 1MBs, thus 10MB default chunk will take 10s, and having reference value equal to 10s, our change will be exactly 0.
Next property of logarithm is that, if you the result of ''log()'' you substract 1, you shift the value when logarithm becomes possitive e.g. given network 10MBs, thus 10MB default chunk will take 1s, and having reference value equal to 10s, our correction parameter ''log()-1'' change will be equal to 1, so your next chunk size will be 20MBs. The zero border will be shifted to 3MBs in order to start increasing the chunk size. This can be shifted by substracting log value more.
RFC - Request for Comments
@DeepDiver1975 @ogoffart @guruz @jturcotte @hodyroff @felixboehm @butonic
Excel with calculations:
dynamic-chunking.ods.zip