Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete finished job in the PropagateDirectory scheduleNextJob. #5269 #5400

Merged
merged 1 commit into from
Jan 26, 2017

Conversation

mrow4a
Copy link
Contributor

@mrow4a mrow4a commented Dec 20, 2016

Delete finished job in the PropagateDerectory scheduleNextJob.

This should reduce code complexity from #5274 and solve the issue #5269

@mention-bot
Copy link

@mrow4a, thanks for your PR! By analyzing the history of the files in this pull request, we identified @ogoffart, @ckamm and @dragotin to be potential reviewers.

@mrow4a
Copy link
Contributor Author

mrow4a commented Dec 20, 2016

@felixboehm @guruz

int subJobsCount = _subJobs.count();
while (i < subJobsCount && _subJobs.at(i)->_state == Finished) {
_firstUnfinishedSubJob = ++i;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fix this sophisticated caching. Is there any reason to keep Finished Jobs?

@felixboehm felixboehm requested a review from guruz December 20, 2016 19:00
@mrow4a
Copy link
Contributor Author

mrow4a commented Dec 20, 2016

The only place client could use the "job" is in https://github.com/owncloud/client/blob/master/src/libsync/propagatedownload.cpp#L425 and https://github.com/owncloud/client/blob/master/src/libsync/owncloudpropagator.cpp#L557

However, it checks only for Running ones, so there is no problem.

@@ -188,18 +188,18 @@ class OWNCLOUDSYNC_EXPORT PropagateDirectory : public PropagatorJob {
QScopedPointer<PropagateItemJob>_firstJob;

// all the sub files or sub directories.
QVector<PropagatorJob *> _subJobs;
QList<PropagatorJob *> _subJobs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is good, https://marcmutz.wordpress.com/effective-qt/containers/
@ogoffart might have to say more

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, didnt know that QList is Qt implementation (of array sth?). QLinkedList then? I just need to be able to delete from early begining of the list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can delete from the front with QVector and we might want to go for memory over speed here. But I don't have confidence of the best choice here. QVector should be fine.

@guruz guruz requested review from ogoffart and guruz December 21, 2016 13:31
Copy link
Contributor

@ckamm ckamm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me.


while (subJobsIterator.hasNext()) {
subJobsIterator.next();
if (subJobsIterator.peekPrevious()->_state != Finished && subJobsIterator.peekPrevious()->parallelism() != FullParallelism) {
return WaitForFinished;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: Would prefer const auto job = subJobsIterator.next(); and then use job over subJobsIterator.peekPrevious(). Usage of peekPrevious() makes me think that something special is happening. Same in the other loop below.

I'd even be fine calling subJobsIterator it.

@@ -188,18 +188,18 @@ class OWNCLOUDSYNC_EXPORT PropagateDirectory : public PropagatorJob {
QScopedPointer<PropagateItemJob>_firstJob;

// all the sub files or sub directories.
QVector<PropagatorJob *> _subJobs;
QList<PropagatorJob *> _subJobs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can delete from the front with QVector and we might want to go for memory over speed here. But I don't have confidence of the best choice here. QVector should be fine.

@mrow4a
Copy link
Contributor Author

mrow4a commented Jan 3, 2017

@ogoffart Do you see anything from logic perspective ? Should we check it in all possible scenarious to find out?

@ckamm ckamm changed the title Delete finished job in the PropagateDerectory scheduleNextJob. #5269 Delete finished job in the PropagateDirectory scheduleNextJob. #5269 Jan 4, 2017
@ckamm
Copy link
Contributor

ckamm commented Jan 4, 2017

@mrow4a I looked at it from a _subJobs logic point of view.

  • Previously jobs where deleted by qDeleteAll(_subJobs). I don't think most jobs delete themselves. Probably need explicit deletion when they are removed from the list.
  • It is no longer safe to call append(Job) while a directory job is running. Nothing uses that as far as I am aware, but maybe it makes sense to adjust the total job count when append is called? From a different point of view: Why do we need to track _jobsFinished and compare it to totalJobs in the first place? We can probably drop these and detect being finished in a different way.

I didn't see any other issues.

@mrow4a
Copy link
Contributor Author

mrow4a commented Jan 5, 2017

@ckamm I think the simplest way could be to track _jobsFinished, it is very self explaining and easy to understand for other developer. I did not really think of any other solution because that one was seeming quite nice. I agree on append to increase the counter, true. BUt I dont think it is safe to do it while iterating, maybe we should comment it instead or assert that if running append is not possible?

BTW: If it will occur that we can merge this PR, I think we can safely say that 2.3 decreases both memory and CPU usage, we get rid of unnecessary loops and in-memory objects.

@guruz
Copy link
Contributor

guruz commented Jan 5, 2017

@mrow4a Please change QList back to QVector.

Also add a qFatal please on add instead of assert.

(I didn't look at the code itself, just infering from you and @ckamm 's comments)

@mrow4a
Copy link
Contributor Author

mrow4a commented Jan 5, 2017

@guruz If we would use QVector with iterators, complexity is O(n). If so, I would rather stay with Caching as it is now and dont use this PR.

@guruz
Copy link
Contributor

guruz commented Jan 5, 2017

@mrow4a Why do you think it is different for QVector vs QList?

@mrow4a
Copy link
Contributor Author

mrow4a commented Jan 5, 2017

@guruz I think we would need to use the class which allows delete from any point in the list e.g. QLinkedList instead of QList(Qt custom implementation?) or QVector

@jturcotte
Copy link
Member

jturcotte commented Jan 5, 2017

@mrow4a QList behaves exactly as QVector when sizeof(T) <= sizeof(void*). In both cases deleting from the front will only involve a memmove since pointers are POD types without destructors.

A QLinkedList might scale better, but we need to be conservative with memory during sync, so as @ckamm mentioned, not sure either if it's worth it. If you can't see QVector operations in a profiler while syncing, it's definitely not worth it.

@mrow4a
Copy link
Contributor Author

mrow4a commented Jan 5, 2017

@jturcotte I think we would be very conservative with memory if deleting every finished items compared to current Vector implementation which stores everything in memory.

@guruz
Copy link
Contributor

guruz commented Jan 5, 2017

@jturcotte @mrow4a One could set the "deleted" pointer in the qlist/vector to 0 instead of using QLinkedList

(EDIT: Although I also agree with @jturcotte that if it does not show up in profiling, just leave it as qvector as it was before.. even if QList is equivalent for pointers, minimize the code changes)

@mrow4a
Copy link
Contributor Author

mrow4a commented Jan 5, 2017

@guruz yes this is another approach, however you would need to keep current cache implementation not to perform unnecesary loops

Hmm, I actualy not sure about that, I think memory still will be allocated to fit the previous object isnt it? EDIT: sorry, we store pointers, forget, but the previous comment is valid.

Copy link
Member

@jturcotte jturcotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes deleting finished jobs sounds like a good idea, but QLinkedList will also be longer to iterate (more memory deferences) and cause a lot of small allocations instead of one allocation for the vector (more fragmentations). Loops don't directly matter, it's CPU cycles that matter and I think that setting the pointer to NULL could work pretty well.

// peekPrevious() will directly access the item through hash in the QList at that subjob
if (subJobsIterator.peekPrevious()->_state == Finished) {
// if this items is finish, remove it from the _subJobs list as it is not needed anymore
subJobsIterator.remove();
Copy link
Member

@jturcotte jturcotte Jan 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this leak the PropagatorJob and all its children? I'm surprised that you could just delete jobs without crashes, it's possible that you'll start facing issues if you actually delete the job. Please test thoroughly with very large syncs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, we will now just set pointer to 0? As I understand, I should also call delete on class pointer to delete?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds about right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove() does not delete, so setting to 0 does also not delete. You need to find out where it is deleted normally and check if you're not breaking anything by deleting here.

Copy link
Contributor

@ckamm ckamm Jan 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, delete it. Currently I think it is a leak, also mentioned it in #5400 (comment)

@@ -604,6 +604,11 @@ bool PropagateDirectory::scheduleNextJob()
if (_state == NotYetStarted) {
_state = Running;

// at the begining of the Directory Job, update expected number of Jobs to be synced
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please start your comments with a capital letter, and end them with a period.

@mrow4a mrow4a force-pushed the remove_finished_subjobs branch 2 times, most recently from 1b375b1 to 0b59917 Compare January 6, 2017 12:47
@mrow4a
Copy link
Contributor Author

mrow4a commented Jan 6, 2017

Ok, corrected the required things, also did a test with 5000 files in nested directories, deleting, moving, renaming, downloading, uploading etc. Works correctly.

I also did a test syncing 1000 files in 10 directories on my local owncloud:
Client version 2.2.4: 67s and 68s
Client this branch: 49s, 47s

Of course, increasing bookkeeping time higher the timing will not be that visible, but generaly it works faster, and because we delete jobs it should be also much more memory friendly.

@felixboehm @jturcotte @guruz @cdamken @ogoffart @ckamm

@mrow4a mrow4a force-pushed the remove_finished_subjobs branch from 0b59917 to 4a70a23 Compare January 6, 2017 13:52
// Note that in this case remove() from QVector will just perform memmove of pointer array items.
PropagatorJob * jobPointer = subJobsIterator.value();
subJobsIterator.remove();
delete jobPointer;
continue;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a moment in which I am removing the job from the queue and deleting the referenced object since it is not used anymore nowhere.

@ckamm
Copy link
Contributor

ckamm commented Jan 9, 2017

@mrow4a I'm busy today, but will review tomorrow!

@guruz guruz added this to the 2.3.0 milestone Jan 20, 2017
@guruz
Copy link
Contributor

guruz commented Jan 20, 2017

As per IRC, we want something like this in 2.3.
@jturcotte can you take over? @mrow4a is out for the month

@jturcotte jturcotte force-pushed the remove_finished_subjobs branch from 9179ba8 to 55dab17 Compare January 25, 2017 15:20
@jturcotte jturcotte dismissed their stale review January 25, 2017 15:23

Now using QVector::removeOne

@jturcotte jturcotte changed the base branch from master to 2.3 January 25, 2017 15:23
_firstJob.reset();
} else {
bool removed = _subJobs.removeOne(subJob);
// FIXME: Try debug build
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (status == SyncFileItem::FatalError ||
(sender() == _firstJob.data() && status != SyncFileItem::Success && status != SyncFileItem::Restoration)) {
(subJob == _firstJob.data() && status != SyncFileItem::Success && status != SyncFileItem::Restoration)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just cleared _firstJob earlier, so this would always be false!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added a test too.

@jturcotte jturcotte force-pushed the remove_finished_subjobs branch from 55dab17 to 434fcbb Compare January 25, 2017 18:45
@jturcotte jturcotte force-pushed the remove_finished_subjobs branch from 434fcbb to 7b892c5 Compare January 25, 2017 19:02
@mrow4a
Copy link
Contributor Author

mrow4a commented Jan 25, 2017

Can I benchmark it before we merge? Tell me when ready. Can do it tomorrow morning.

@jturcotte
Copy link
Member

@mrow4a Unless I messed up the scheduling there is no way that you will notice a difference if you involve a PHP server that takes >100ms to respond for each file. I'll merge it tomorrow, but if you think you caught some issue let me know and we can revert it (could be non-perf related too).

@mrow4a
Copy link
Contributor Author

mrow4a commented Jan 25, 2017

I need several minutes, to rerun the test several times for many files. I wont check it with HTTP2 though, should I?

@jturcotte
Copy link
Member

@mrow4a HTTP2 is a layer under, it won't have any effect on the scheduling beside the fact that the server might take maybe 10% less time to respond and trigger the next job. I simulated it with the unit tests that don't even involve a server and I couldn't see a difference (fake HTTP response posted to the event loop), so I doubt that it would make any difference whether you involve HTTP or HTTP2 regarding this patch.

@mrow4a
Copy link
Contributor Author

mrow4a commented Jan 25, 2017

In HTTP2, I am talking about 100 files in parallel. In one connection

@mrow4a
Copy link
Contributor Author

mrow4a commented Jan 25, 2017

@jturcotte Hmm, looking at your modified changes, I think you are right and it wont make a difference, since you do it in slotSubJobFinished(). In my version I did it in scheduleNextJob(), maybe this is why I saw a difference. I think you are right, that having it in slotSubJobFinished, backed that you did a local test, should not have any influence here. Will do the tests in case, but dont expect anything wild.

👍

@jturcotte
Copy link
Member

@mrow4a OK, thanks for testing!

@jturcotte jturcotte merged commit c3ae512 into 2.3 Jan 26, 2017
@jturcotte jturcotte deleted the remove_finished_subjobs branch January 26, 2017 09:03
@mrow4a
Copy link
Contributor Author

mrow4a commented Jan 26, 2017

@jturcotte @ogoffart For HTTP2, comparing branches 2.3 and 2.3-pre1, having sync of 1000 files of 1kB, total 1MB, 100 files in parallel to CERNBOX EOS, on WiFi and 52 ms latency, repeating 5 times, in average your PR drops upload from 19s to 20-21s.

@jturcotte
Copy link
Member

jturcotte commented Jan 26, 2017

@mrow4a Please allow me to be skeptical, but I don't see how this patch could lead to those results, not for a vector of less than 1000 elements. Could you paste the 5 run times of each configuration? What's the standard deviation between runs?

You can't profile CPU-bound code with network times, it's like trying to catch a flea with a fishing net.

@mrow4a
Copy link
Contributor Author

mrow4a commented Jan 26, 2017

https://cloudstor.aarnet.edu.au/plus/index.php/s/jWBwy6gm2ZtqSYi

But yes, it was done first one set, then the other, so it could be also variation of the system itself (which EOS has a lot during the day). I should do it interleaving runs and at night though to be sure. I also saw a run in 2.3 which took 17+s so it could be that, 100 files in parallel is probably hard to get reproducable results in production system.

@mrow4a
Copy link
Contributor Author

mrow4a commented Jan 26, 2017

I placed also results for HTTP1 in shared file, and I indeed can see that especially for http1 with 6 connections the veriations can be quite big, so I dont even try to deduce sth from there. However they oscillate around the one line.

Bet that this for HTTP1 does not make any difference, maybe for HTTP2 with a lot of files in parallel though

@mrow4a
Copy link
Contributor Author

mrow4a commented Jan 26, 2017

Anyways, I think the benefit of this is bigger than the use case of not having this PR. I just remember that during test we had a little bigger difference in the previous version of this PR in HTTP2 so it alarmed me.

Nevertheless, 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants