Expensive operations in updateL!
on large models
#649
Replies: 2 comments 1 reply
-
Until Julia's threading scheduler is fully aware of BLAS threads, I'm not sure there's much to gain from threading code that's calling a lot of BLAS things. That warning aside, I probably wouldn't do locking here, but rather spawn tasks everytime we get to a "parallel" point and then synchronize on them, before moving on to the next one. It's not the richest way to handle the data dependencies, but it avoids locks and shouldn't be perceptively slower than the current single-threaded case, no matter how bad scheduling contention is. |
Beta Was this translation helpful? Give feedback.
-
The I had an idea today while out on a walk that it may be possible to use Cartesian indexing for the majority of the operations. We are updating the lower triangle of the symmetric matrix stored in column-major rectangular full packed format. That means that 3/4 of the elements are stored in their original positions in the parent array when the size is odd, or shifted down by one row when the size is even. If the levels of the second grouping factor are sorted by decreasing number of observations then the majority of the update operations will be in this fast update group. I'll experiment with some code. |
Beta Was this translation helpful? Give feedback.
-
I created
timedupdateL!
to report on the time spent in each step ofupdateL!
, and applied it to the model of the movielens.org data (ml-latest.zip) with the constraints that the number of ratings per user and per movie both were required to be greater than or equal to 20The model is
providing timings of
That is, the
rankUpdate!
ofL[2,2]
fromL[2,1]
is the elephant in the room. And the timings of that operation are much worse whenL[2,2]
is stored in rectangular full-block (RFP) format. (The Cholesky factorization of the diagonal block after therankUpdate!
is comparable in speed under RFP.)@palday What are the limitations on locking storage in a multi-threaded situation? The way that this update of
C
, the symmetric block on the diagonal, fromA
, a sparse block to the left of it, is performed is to iterate over the columns of A then a non-zero in that column, saya
at rowr
, then the non-zeros at and belowa
. The updates inC
from non-zeros at and belowa
occur in ther
th column ofC
. So it is feasible to use multiple threads as long as clashes caused by two threads both hitting ther
th row are avoided. I'm not sure how to put a lock on an index though. Do you think it is feasible?Beta Was this translation helpful? Give feedback.
All reactions