You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
VINE_WORKER_SOURCE_MAX_TRANSFERS defines the maximum number of ports on a worker that can be used to transfer temp files to other workers. The default value is 3.
If each temp file is assigned to replicate for 5 times, these are what will happen:
Initially, when the file was created, all workers are free on transfer, it can immediately replicate to 4 other workers.
When more files are gradually created, most workers are busy with transferring, and most files remain unreplicated, because the number of tasks is substantially larger than the number of workers.
The needed number of ports is very large, say we have n workers, each worker is allowed to open k ports to replicate, and we have m tasks. The required ports in total is (n-1)*k*m throughout the lifetime.
There is a delay from when a file is successfully replicated to when the cache update message arrives at the manager, resulting the inefficient replicating process.
It seems when a file is successfully replicated, the ports are not freed on that worker? (Maybe I just didn't find the code)
I posted the following figure two weeks ago and thought it was because the manager prefers task dispatching and outputs retrieving, so most files don't reach their replication threshold. But after I prioritized the replication step, I got almost the same distribution. Things are more complex than I thought.
The text was updated successfully, but these errors were encountered:
JinZhou5042
changed the title
vine: too strict worker p2p transfer limitation
vine: most temp files do not reach their replication threshold
Nov 27, 2024
I tried to extend the execution time of each task from 1s to 40s, to let the manager have some idle time to replicate, but the replication pattern didn't change. So I suppose it was not because the tasks run very fast, and the workflow ends before more replications happen.
I did another test by setting q->worker_source_max_transfers = 10000, every file was able to replicate for 5 times.
VINE_WORKER_SOURCE_MAX_TRANSFERS
defines the maximum number of ports on a worker that can be used to transfer temp files to other workers. The default value is 3.If each temp file is assigned to replicate for 5 times, these are what will happen:
n
workers, each worker is allowed to openk
ports to replicate, and we havem
tasks. The required ports in total is(n-1)*k*m
throughout the lifetime.I posted the following figure two weeks ago and thought it was because the manager prefers task dispatching and outputs retrieving, so most files don't reach their replication threshold. But after I prioritized the replication step, I got almost the same distribution. Things are more complex than I thought.
The text was updated successfully, but these errors were encountered: