Delete finished job in the PropagateDirectory scheduleNextJob. #5269 #5400

mrow4a · 2016-12-20T16:34:03Z

Delete finished job in the PropagateDerectory scheduleNextJob.

This should reduce code complexity from #5274 and solve the issue #5269

mention-bot · 2016-12-20T16:34:05Z

@mrow4a, thanks for your PR! By analyzing the history of the files in this pull request, we identified @ogoffart, @ckamm and @dragotin to be potential reviewers.

mrow4a · 2016-12-20T16:34:31Z

@felixboehm @guruz

mrow4a · 2016-12-20T17:01:45Z

src/libsync/owncloudpropagator.cpp

-    int subJobsCount = _subJobs.count();
-    while (i < subJobsCount && _subJobs.at(i)->_state == Finished) {
-      _firstUnfinishedSubJob = ++i;
-    }


This will fix this sophisticated caching. Is there any reason to keep Finished Jobs?

mrow4a · 2016-12-20T23:17:04Z

The only place client could use the "job" is in https://github.com/owncloud/client/blob/master/src/libsync/propagatedownload.cpp#L425 and https://github.com/owncloud/client/blob/master/src/libsync/owncloudpropagator.cpp#L557

However, it checks only for Running ones, so there is no problem.

guruz · 2016-12-21T13:30:31Z

src/libsync/owncloudpropagator.h

@@ -188,18 +188,18 @@ class OWNCLOUDSYNC_EXPORT PropagateDirectory : public PropagatorJob {
    QScopedPointer<PropagateItemJob>_firstJob;

    // all the sub files or sub directories.
-    QVector<PropagatorJob *> _subJobs;
+    QList<PropagatorJob *> _subJobs;


Not sure if this is good, https://marcmutz.wordpress.com/effective-qt/containers/
@ogoffart might have to say more

Interesting, didnt know that QList is Qt implementation (of array sth?). QLinkedList then? I just need to be able to delete from early begining of the list.

You can delete from the front with QVector and we might want to go for memory over speed here. But I don't have confidence of the best choice here. QVector should be fine.

ckamm

Overall this looks good to me.

ckamm · 2017-01-03T15:00:14Z

src/libsync/owncloudpropagator.cpp

+
+    while (subJobsIterator.hasNext()) {
+        subJobsIterator.next();
+        if (subJobsIterator.peekPrevious()->_state != Finished && subJobsIterator.peekPrevious()->parallelism() != FullParallelism) {
            return WaitForFinished;


minor: Would prefer const auto job = subJobsIterator.next(); and then use job over subJobsIterator.peekPrevious(). Usage of peekPrevious() makes me think that something special is happening. Same in the other loop below.

I'd even be fine calling subJobsIterator it.

ckamm · 2017-01-03T15:11:52Z

src/libsync/owncloudpropagator.h

@@ -188,18 +188,18 @@ class OWNCLOUDSYNC_EXPORT PropagateDirectory : public PropagatorJob {
    QScopedPointer<PropagateItemJob>_firstJob;

    // all the sub files or sub directories.
-    QVector<PropagatorJob *> _subJobs;
+    QList<PropagatorJob *> _subJobs;


You can delete from the front with QVector and we might want to go for memory over speed here. But I don't have confidence of the best choice here. QVector should be fine.

mrow4a · 2017-01-03T15:23:03Z

@ogoffart Do you see anything from logic perspective ? Should we check it in all possible scenarious to find out?

ckamm · 2017-01-04T13:02:38Z

@mrow4a I looked at it from a _subJobs logic point of view.

Previously jobs where deleted by qDeleteAll(_subJobs). I don't think most jobs delete themselves. Probably need explicit deletion when they are removed from the list.
It is no longer safe to call append(Job) while a directory job is running. Nothing uses that as far as I am aware, but maybe it makes sense to adjust the total job count when append is called? From a different point of view: Why do we need to track _jobsFinished and compare it to totalJobs in the first place? We can probably drop these and detect being finished in a different way.

I didn't see any other issues.

mrow4a · 2017-01-05T10:19:02Z

@ckamm I think the simplest way could be to track _jobsFinished, it is very self explaining and easy to understand for other developer. I did not really think of any other solution because that one was seeming quite nice. I agree on append to increase the counter, true. BUt I dont think it is safe to do it while iterating, maybe we should comment it instead or assert that if running append is not possible?

BTW: If it will occur that we can merge this PR, I think we can safely say that 2.3 decreases both memory and CPU usage, we get rid of unnecessary loops and in-memory objects.

guruz · 2017-01-05T10:33:04Z

@mrow4a Please change QList back to QVector.

Also add a qFatal please on add instead of assert.

(I didn't look at the code itself, just infering from you and @ckamm 's comments)

mrow4a · 2017-01-05T10:39:39Z

@guruz If we would use QVector with iterators, complexity is O(n). If so, I would rather stay with Caching as it is now and dont use this PR.

guruz · 2017-01-05T11:02:09Z

@mrow4a Why do you think it is different for QVector vs QList?

mrow4a · 2017-01-05T11:06:19Z

@guruz I think we would need to use the class which allows delete from any point in the list e.g. QLinkedList instead of QList(Qt custom implementation?) or QVector

jturcotte · 2017-01-05T11:06:52Z

@mrow4a QList behaves exactly as QVector when sizeof(T) <= sizeof(void*). In both cases deleting from the front will only involve a memmove since pointers are POD types without destructors.

A QLinkedList might scale better, but we need to be conservative with memory during sync, so as @ckamm mentioned, not sure either if it's worth it. If you can't see QVector operations in a profiler while syncing, it's definitely not worth it.

mrow4a · 2017-01-05T11:09:02Z

@jturcotte I think we would be very conservative with memory if deleting every finished items compared to current Vector implementation which stores everything in memory.

guruz · 2017-01-05T11:09:28Z

@jturcotte @mrow4a One could set the "deleted" pointer in the qlist/vector to 0 instead of using QLinkedList

(EDIT: Although I also agree with @jturcotte that if it does not show up in profiling, just leave it as qvector as it was before.. even if QList is equivalent for pointers, minimize the code changes)

mrow4a · 2017-01-05T11:10:55Z

@guruz yes this is another approach, however you would need to keep current cache implementation not to perform unnecesary loops

Hmm, I actualy not sure about that, I think memory still will be allocated to fit the previous object isnt it? EDIT: sorry, we store pointers, forget, but the previous comment is valid.

jturcotte

Yes deleting finished jobs sounds like a good idea, but QLinkedList will also be longer to iterate (more memory deferences) and cause a lot of small allocations instead of one allocation for the vector (more fragmentations). Loops don't directly matter, it's CPU cycles that matter and I think that setting the pointer to NULL could work pretty well.

jturcotte · 2017-01-05T11:27:02Z

src/libsync/owncloudpropagator.cpp

+        // peekPrevious() will directly access the item through hash in the QList at that subjob
+        if (subJobsIterator.peekPrevious()->_state == Finished) {
+            // if this items is finish, remove it from the _subJobs list as it is not needed anymore
+            subJobsIterator.remove();


Won't this leak the PropagatorJob and all its children? I'm surprised that you could just delete jobs without crashes, it's possible that you'll start facing issues if you actually delete the job. Please test thoroughly with very large syncs.

As I understand, we will now just set pointer to 0? As I understand, I should also call delete on class pointer to delete?

That sounds about right.

remove() does not delete, so setting to 0 does also not delete. You need to find out where it is deleted normally and check if you're not breaking anything by deleting here.

Yes, delete it. Currently I think it is a leak, also mentioned it in #5400 (comment)

jturcotte · 2017-01-05T11:32:41Z

src/libsync/owncloudpropagator.cpp

@@ -604,6 +604,11 @@ bool PropagateDirectory::scheduleNextJob()
    if (_state == NotYetStarted) {
        _state = Running;

+        // at the begining of the Directory Job, update expected number of Jobs to be synced


Please start your comments with a capital letter, and end them with a period.

mrow4a · 2017-01-06T12:55:22Z

Ok, corrected the required things, also did a test with 5000 files in nested directories, deleting, moving, renaming, downloading, uploading etc. Works correctly.

I also did a test syncing 1000 files in 10 directories on my local owncloud:
Client version 2.2.4: 67s and 68s
Client this branch: 49s, 47s

Of course, increasing bookkeeping time higher the timing will not be that visible, but generaly it works faster, and because we delete jobs it should be also much more memory friendly.

@felixboehm @jturcotte @guruz @cdamken @ogoffart @ckamm

mrow4a · 2017-01-06T13:53:49Z

src/libsync/owncloudpropagator.cpp

+            // Note that in this case remove() from QVector will just perform memmove of pointer array items.
+            PropagatorJob * jobPointer = subJobsIterator.value();
+            subJobsIterator.remove();
+            delete jobPointer;
            continue;


This is a moment in which I am removing the job from the queue and deleting the referenced object since it is not used anymore nowhere.

ckamm · 2017-01-09T08:49:00Z

@mrow4a I'm busy today, but will review tomorrow!

guruz · 2017-01-20T19:07:12Z

As per IRC, we want something like this in 2.3.
@jturcotte can you take over? @mrow4a is out for the month

Now using QVector::removeOne

ogoffart · 2017-01-25T18:12:38Z

src/libsync/owncloudpropagator.cpp

+        _firstJob.reset();
+    } else {
+        bool removed = _subJobs.removeOne(subJob);
+        // FIXME: Try debug build


remove the comment

ogoffart · 2017-01-25T18:13:53Z

src/libsync/owncloudpropagator.cpp

    if (status == SyncFileItem::FatalError ||
-            (sender() == _firstJob.data() && status != SyncFileItem::Success && status != SyncFileItem::Restoration)) {
+            (subJob == _firstJob.data() && status != SyncFileItem::Success && status != SyncFileItem::Restoration)) {


You just cleared _firstJob earlier, so this would always be false!

Done, added a test too.

mrow4a · 2017-01-25T22:26:16Z

Can I benchmark it before we merge? Tell me when ready. Can do it tomorrow morning.

jturcotte · 2017-01-25T22:30:26Z

@mrow4a Unless I messed up the scheduling there is no way that you will notice a difference if you involve a PHP server that takes >100ms to respond for each file. I'll merge it tomorrow, but if you think you caught some issue let me know and we can revert it (could be non-perf related too).

mrow4a · 2017-01-25T22:33:46Z

I need several minutes, to rerun the test several times for many files. I wont check it with HTTP2 though, should I?

jturcotte · 2017-01-25T22:40:01Z

@mrow4a HTTP2 is a layer under, it won't have any effect on the scheduling beside the fact that the server might take maybe 10% less time to respond and trigger the next job. I simulated it with the unit tests that don't even involve a server and I couldn't see a difference (fake HTTP response posted to the event loop), so I doubt that it would make any difference whether you involve HTTP or HTTP2 regarding this patch.

mrow4a · 2017-01-25T22:57:13Z

In HTTP2, I am talking about 100 files in parallel. In one connection

mrow4a · 2017-01-25T23:18:05Z

@jturcotte Hmm, looking at your modified changes, I think you are right and it wont make a difference, since you do it in slotSubJobFinished(). In my version I did it in scheduleNextJob(), maybe this is why I saw a difference. I think you are right, that having it in slotSubJobFinished, backed that you did a local test, should not have any influence here. Will do the tests in case, but dont expect anything wild.

👍

jturcotte · 2017-01-26T09:02:04Z

@mrow4a OK, thanks for testing!

mrow4a · 2017-01-26T12:36:28Z

@jturcotte @ogoffart For HTTP2, comparing branches 2.3 and 2.3-pre1, having sync of 1000 files of 1kB, total 1MB, 100 files in parallel to CERNBOX EOS, on WiFi and 52 ms latency, repeating 5 times, in average your PR drops upload from 19s to 20-21s.

jturcotte · 2017-01-26T12:56:56Z

@mrow4a Please allow me to be skeptical, but I don't see how this patch could lead to those results, not for a vector of less than 1000 elements. Could you paste the 5 run times of each configuration? What's the standard deviation between runs?

You can't profile CPU-bound code with network times, it's like trying to catch a flea with a fishing net.

mrow4a · 2017-01-26T13:43:19Z

https://cloudstor.aarnet.edu.au/plus/index.php/s/jWBwy6gm2ZtqSYi

But yes, it was done first one set, then the other, so it could be also variation of the system itself (which EOS has a lot during the day). I should do it interleaving runs and at night though to be sure. I also saw a run in 2.3 which took 17+s so it could be that, 100 files in parallel is probably hard to get reproducable results in production system.

mrow4a · 2017-01-26T13:52:15Z

I placed also results for HTTP1 in shared file, and I indeed can see that especially for http1 with 6 connections the veriations can be quite big, so I dont even try to deduce sth from there. However they oscillate around the one line.

Bet that this for HTTP1 does not make any difference, maybe for HTTP2 with a lot of files in parallel though

mrow4a · 2017-01-26T13:56:04Z

Anyways, I think the benefit of this is bigger than the use case of not having this PR. I just remember that during test we had a little bigger difference in the previous version of this PR in HTTP2 so it alarmed me.

Nevertheless, 👍

mrow4a commented Dec 20, 2016

View reviewed changes

felixboehm requested a review from guruz December 20, 2016 19:00

guruz reviewed Dec 21, 2016

View reviewed changes

guruz requested review from ogoffart and guruz December 21, 2016 13:31

ckamm reviewed Jan 3, 2017

View reviewed changes

ckamm changed the title ~~Delete finished job in the PropagateDerectory scheduleNextJob. #5269~~ Delete finished job in the PropagateDirectory scheduleNextJob. #5269 Jan 4, 2017

jturcotte previously requested changes Jan 5, 2017

View reviewed changes

mrow4a force-pushed the remove_finished_subjobs branch 2 times, most recently from 1b375b1 to 0b59917 Compare January 6, 2017 12:47

mrow4a force-pushed the remove_finished_subjobs branch from 0b59917 to 4a70a23 Compare January 6, 2017 13:52

mrow4a commented Jan 6, 2017

View reviewed changes

mrow4a force-pushed the remove_finished_subjobs branch from 4a70a23 to 9179ba8 Compare January 8, 2017 01:35

mrow4a mentioned this pull request Jan 8, 2017

Group requests to ensure balance between bandwidth utilization and bookkeeping #5391 #5390 #4498 #1633 #4454 #5440

Closed

4 tasks

guruz added this to the 2.3.0 milestone Jan 20, 2017

guruz assigned jturcotte Jan 20, 2017

jturcotte force-pushed the remove_finished_subjobs branch from 9179ba8 to 55dab17 Compare January 25, 2017 15:20

jturcotte removed the DO NOT MERGE YET label Jan 25, 2017

jturcotte changed the base branch from master to 2.3 January 25, 2017 15:23

jturcotte mentioned this pull request Jan 25, 2017

Delete SyncFileItems after propagation #5488

Merged

ogoffart reviewed Jan 25, 2017

View reviewed changes

ogoffart suggested changes Jan 25, 2017

View reviewed changes

jturcotte force-pushed the remove_finished_subjobs branch from 55dab17 to 434fcbb Compare January 25, 2017 18:45

Delete finished propagation jobs in PropagateDirectory #5269

7b892c5

jturcotte force-pushed the remove_finished_subjobs branch from 434fcbb to 7b892c5 Compare January 25, 2017 19:02

ogoffart approved these changes Jan 25, 2017

View reviewed changes

jturcotte merged commit c3ae512 into 2.3 Jan 26, 2017

jturcotte deleted the remove_finished_subjobs branch January 26, 2017 09:03

This was referenced Jan 26, 2017

PropagateUploadFileCommon: squeeze() the _jobs when done #5458

Closed

PropagateDirectory: Investigate if _subJobs should be pre-allocated (reserve()) or squeeze() #5454

Closed

Delete finished job in the PropagateDirectory scheduleNextJob. #5269 #5400

Delete finished job in the PropagateDirectory scheduleNextJob. #5269 #5400

Conversation

mrow4a commented Dec 20, 2016 • edited Loading

mention-bot commented Dec 20, 2016

mrow4a commented Dec 20, 2016

Choose a reason for hiding this comment

mrow4a commented Dec 20, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ckamm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrow4a commented Jan 3, 2017

ckamm commented Jan 4, 2017

mrow4a commented Jan 5, 2017 • edited Loading

guruz commented Jan 5, 2017

mrow4a commented Jan 5, 2017

guruz commented Jan 5, 2017

mrow4a commented Jan 5, 2017

jturcotte commented Jan 5, 2017 • edited Loading

mrow4a commented Jan 5, 2017

guruz commented Jan 5, 2017 • edited Loading

mrow4a commented Jan 5, 2017 • edited Loading

jturcotte left a comment

Choose a reason for hiding this comment

jturcotte Jan 5, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ckamm Jan 5, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrow4a commented Jan 6, 2017

Choose a reason for hiding this comment

ckamm commented Jan 9, 2017

guruz commented Jan 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrow4a commented Jan 25, 2017 • edited Loading

jturcotte commented Jan 25, 2017

mrow4a commented Jan 25, 2017

jturcotte commented Jan 25, 2017

mrow4a commented Jan 25, 2017 • edited Loading

mrow4a commented Jan 25, 2017 • edited Loading

jturcotte commented Jan 26, 2017

mrow4a commented Jan 26, 2017 • edited Loading

jturcotte commented Jan 26, 2017 • edited Loading

mrow4a commented Jan 26, 2017 • edited Loading

mrow4a commented Jan 26, 2017 • edited Loading

mrow4a commented Jan 26, 2017 • edited Loading

mrow4a commented Dec 20, 2016 •

edited

Loading

mrow4a commented Jan 5, 2017 •

edited

Loading

jturcotte commented Jan 5, 2017 •

edited

Loading

guruz commented Jan 5, 2017 •

edited

Loading

mrow4a commented Jan 5, 2017 •

edited

Loading

jturcotte Jan 5, 2017 •

edited

Loading

ckamm Jan 5, 2017 •

edited

Loading

mrow4a commented Jan 25, 2017 •

edited

Loading

mrow4a commented Jan 25, 2017 •

edited

Loading

mrow4a commented Jan 25, 2017 •

edited

Loading

mrow4a commented Jan 26, 2017 •

edited

Loading

jturcotte commented Jan 26, 2017 •

edited

Loading

mrow4a commented Jan 26, 2017 •

edited

Loading

mrow4a commented Jan 26, 2017 •

edited

Loading

mrow4a commented Jan 26, 2017 •

edited

Loading