Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sync files scheduler class #5269

Closed
wants to merge 1 commit into from
Closed

Add sync files scheduler class #5269

wants to merge 1 commit into from

Conversation

mrow4a
Copy link
Contributor

@mrow4a mrow4a commented Oct 23, 2016

Hello guys:

While working on bundling, I analysed a little bit of code, and I found these little flower there, which will reduce the number of loops significantly, in this example from 506000 to 3000 for 1000 files. However, this implementation also has something more in mind, but I will explain it later, please verify and do some QA on this small solution:

MOTIVATION:

  • mkdir, move, delete, ignore jobs, and specialy mkdir jobs should be running with checking status of subjobs within this particular directory job, while new and update jobs can be being run in the manner proposed in the code

The only change here is that we actualy for other jobs than NEW and SYNC we proceed in old manner, while for NEW and SYNC we create new job, insert to that directory _primaryJobs and withing that job extract jobs from the queue and execute them, not checking the status of all jobs which were already finished - please refer to the code at first picture. The checking is done at the end of each job verifying if the queue is empty and if number of jobs ordered is equal to number of jobs finished.

  • This will give me an easy way to insert bundling for files, or some nice schedulers which will create 3 parallel pipes of files - upload, download and mixed upload/download of small files. If this will pass, I will propose new code quite quickly - I will just copy paste this concept: https://en.wikipedia.org/wiki/Round-robin_scheduling

Test on localhost, I have added one folder and 1000 files of 1kB:
In the old version you have required to run around half a million loops to execute this code:
loop_half_milion

This is my version with around three thousands loops and 1000 files of 1kB, sync also took a little bit shorter, from 58s to 50s on my machine, and took much less CPU
looped_3000
selection_049
@dragotin @DeepDiver1975 @ogoffart @guruz @cdamken

@mention-bot
Copy link

@mrow4a, thanks for your PR! By analyzing the history of the files in this pull request, we identified @ogoffart, @ckamm and @dragotin to be potential reviewers.

@mrow4a
Copy link
Contributor Author

mrow4a commented Oct 23, 2016

Anyways we should get rid of this exponential/logaritmic loop because for 1k files it is 500k loops, but for 10k files it is 50mln loops.

EDIT:
The reason for excesive loops here is that, after the ack of first file, we have to run this loop 1 time, after 2nd, 2 times, .., after 500 file ack we have to loop over 500 items in the list, and on 1000th file ack we have to loop over list of 1000 files.

I actualy tried to find a use case for sub directory job inside directory, but is it a case in the old implementation?

@mrow4a mrow4a force-pushed the order_files_sync branch 2 times, most recently from 7bc1ec5 to 34d54b7 Compare October 24, 2016 07:27
@felixboehm
Copy link
Contributor

Good catch @mrow4a.

@michaelstingl
Copy link
Contributor

@guruz @ogoffart Could we add this to 2.3.0 ?

@guruz guruz added this to the 2.3.0 milestone Oct 24, 2016
// all the sub files or sub directories.
QVector<PropagatorJob *> _subJobs;
// all the new and changed files without conflicts.
QLinkedList<PropagatorJob *> _syncJobs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK not ideal because of heap fragmentation, but @ogoffart can say more to this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be fine, I have choosen LinkedList to be able to easily put jobs both at the end and begining of the list. I need it for bundling and stuff. You know, everywhere there are pros and cons. Lets discuss it, but this is implementation detail

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, use QList, which reserve space at the begining and at the end.

@guruz
Copy link
Contributor

guruz commented Oct 24, 2016

Amazing. High CPU usage has plagued us there for quite some time!
I don't know how easy it is to break stuff here though.

How does the memory usage compare? Because @jturcotte had worked hard to try to lower this.

Can you look at this please?

The following tests FAILED:
     16 - SyncEngineTest (Failed)
     17 - SyncFileStatusTrackerTest (Failed)
Errors while running CTest

@ogoffart needs to have a look at this from the correctness perspective :)

@mrow4a
Copy link
Contributor Author

mrow4a commented Oct 24, 2016

I guess there are more flowers like that in the code, I will look for more later. I guess another friends like that are in the discovery.

@guruz Hmm, I did not check it, but basicaly it should be the same as previously and lowered with each item, since it frees the memory from linkedlist after each synced file is completed (I am removing item from list, passing it to PropagateItem job, and when it is being destroyed, the item I guess is also being destroyed, since it is not in the linkedlist).

@ogoffart Please take this as a draft, I did not look at unit tests yet. Do you have some suggestions already?

@ogoffart
Copy link
Contributor

I did not review the wole thing, but it seems you are leaking when the sync abort prematuraly.

There is indeed a O(n^2) complexity. But if i understand correctly what you are doing is to having two lists to reduce the n. But the real fix is what the FIXME is saying: cache the value so one does not starts always at 0

@@ -608,22 +610,23 @@ bool PropagateDirectory::scheduleNextJob()

bool stopAtDirectory = false;
// FIXME: use the cached value of finished job
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is what you need to do ro remove the O(N^2) cache the value of the first unfinished job and start from here instead of always starting from 0

// all the sub files or sub directories.
QVector<PropagatorJob *> _subJobs;
// all the new and changed files without conflicts.
QLinkedList<PropagatorJob *> _syncJobs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, use QList, which reserve space at the begining and at the end.

@mrow4a
Copy link
Contributor Author

mrow4a commented Oct 24, 2016

@ogoffart I did not touch this implementation of starting from 0 since I though you had some idea behind it/bug scenario. I can also add this caching thing if you say that does not matter. I would anyways go with the other class syncing files at the end, since I want to add another feature there - job scheduler which will try to fill upload/download bandwidth, running this jobs in parallel. In the PropagateSyncFIles job I want to create like few queues from which I will extract in round robin scheduler way.

EDIT: might be, this code has to be reviewed in the case of aborts/strange scenarios you know

@mrow4a
Copy link
Contributor Author

mrow4a commented Oct 24, 2016

Ok, I pushed new version. This should solve most of the concerns and implementation is very easy and lightweight. It should also reduce memory usage.

Caching (so CPU saving) I will add in another PR.

@guruz @jturcotte @DeepDiver1975 @ogoffart

EDIT: tested manualy with all possible sync operations

EDIT: fixed bug - previously failing unit tests

@mrow4a mrow4a force-pushed the order_files_sync branch 3 times, most recently from 01f7957 to a4039b3 Compare October 24, 2016 20:51
@ckamm
Copy link
Contributor

ckamm commented Oct 25, 2016

@mrow4a I've read through the patch and don't understand yet why having the split into subJobs and syncJobs is necessary. Wouldn't removing all those finished jobs from _subJobs (or, less invasively, keeping track of the first interesting index) have the same effect?

I'm assuming the main thing you want to get rid of is iterating over all these finished jobs all the time?

To be more specific:

int PropagateDirectory::firstUnfinishedSubJob = 0; // each instance has this variable

int i = firstUnfinishedSubJob;
while (i < _subJobs.count() && _subJobs.at(i)->_state == Finished) {
  firstUnfinishedSubJob = ++i;
}

// same as before, but skipping jobs
for (int i = firstUnfinishedSubJob; i < _subJobs.count(); ++i) {
  ...

@ckamm ckamm self-assigned this Oct 25, 2016
@mrow4a
Copy link
Contributor Author

mrow4a commented Oct 25, 2016

Caching is yet another problem, separate PR. In this PR it was "by the way" reducing this loops, actually want to have clear separation of download/upload jobs and other jobs, which will ease pain of adding new features. I want to have another component which will have separate queues of download/upload/chunk/update jobs for bundling, schedulers, delta sync and others.. I dont want to use PropagateDirectory, which is serving there only as a jobs dispatcher.

EDIT: currently jobs are nearly litteraly taken from filesystem as they are seen and sendm without thinking what it really is. There is no real sync protocol there.

@ckamm Please add separate PR with your solution. -> #5274

EDIT:
@ogoffart @ckamm From my understanding, we have DirectoryJob, which count contain other DirectoryJobs, IgnoreJobs and other PropagateItemsJobs, which will be then multithreaded within that thread into some maxParallelThreads and calling concurrently scheduleNextJob after each item is completed. So in that case, if we will have some "_lastRunningJob"shared variable there, all this up to 3-6 threads will access the same value at the same time possible overwriting itself, having outdated value of "_state" and so on, isnt it? Or is that somehow queued?

@ckamm
Copy link
Contributor

ckamm commented Oct 25, 2016

@mrow4a Thanks for explaining, I thought reducing pointless looping was your main objective.

About concurrency: Only the GUI thread calls scheduleNextJob. All threaded concurrency happens purely inside the QNetworkManager when propagation jobs run network tasks.

@mrow4a mrow4a changed the title Remove excesive loops and add sync files scheduler class - 1000 files, reduction from half milion to three thousand loops Add sync files scheduler class Oct 25, 2016
@michaelstingl
Copy link
Contributor

@mrow4a Could you build a Win testclient with your improvements on rotor? A user offered help to test-drive this?

@ogoffart
Copy link
Contributor

Is this still relevant now that #5274 was integrated?

@mrow4a
Copy link
Contributor Author

mrow4a commented Nov 16, 2016

@ogoffart You suggested to do everything in the PropagateDirectory, I will try to fit everything there, along with Bundling. Lets see how it will work. Otherwise it will end up with sth like solution here if it will be too big mess in PropagateDirectory. This class is already complicated, not including anything else.

@mrow4a
Copy link
Contributor Author

mrow4a commented Nov 17, 2016

Outdated by #5440

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants