Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling #4

Open
gdalle opened this issue Jun 6, 2023 · 16 comments
Open

Profiling #4

gdalle opened this issue Jun 6, 2023 · 16 comments
Assignees

Comments

@gdalle
Copy link

gdalle commented Jun 6, 2023

While the issues I raised for memory allocations are good training, their solutions don't seem to bring large gains.
To see where our efforts should concentrate, you will need to profile the code. Care to share the results?

@borisblagov borisblagov self-assigned this Jun 7, 2023
@borisblagov
Copy link
Owner

Okay, so it took me a while and I ran into this bug , which has the following temporary fix but I finally managed to run the profiler and honestly, I am not really sure how to work with it yet. After reading this I understand that the lack of orange and red colors is overall a good thing.

I added the profile.html file. I am not sure whether I can embed it here in some form.

If I understand it correctly, I spend about 70% of the time in genBeta! but I am not sure what to make of it.

@gdalle
Copy link
Author

gdalle commented Jun 17, 2023

Note that the VSCode extension is probably a nicer interface to profiling, at least in my opinion

@gdalle
Copy link
Author

gdalle commented Jun 17, 2023

After reading this I understand that the lack of orange and red colors is overall a good thing.

Indeed but you do have a lot of orange/yellow, which denotes memory allocations. In particular, it seems genBeta! calls inv on a matrix, which is very expensive. Can you get away with solving a linear system instead?

@borisblagov
Copy link
Owner

You mean this?
image

I thought I was using the VScode extention, am I not?

@gdalle
Copy link
Author

gdalle commented Jun 17, 2023

I thought I was using the VScode extention, am I not?

Apparently you are! But in that case you don't need all the funky business with ProfileView.jl, you only need the @profview macro

@gdalle
Copy link
Author

gdalle commented Jun 17, 2023

You mean this?

Yes, that is an inverse computation, which is costly and not always needed. For instance, if you compute $x = A^{-1} b$, it's often better to solve $Ax=b$

@borisblagov
Copy link
Owner

borisblagov commented Jun 17, 2023

I thought I was using the VScode extention, am I not?

Apparently you are! But in that case you don't need all the funky business with ProfileView.jl, you only need the @profview macro

Now I am utterly confused. I am getting this graph by typing in @profview include("MainNewB.jl") 😄 but for that to work (otherwise I got a white screen), I did "import ProfileView.

@borisblagov
Copy link
Owner

borisblagov commented Jun 17, 2023

You mean this?

Yes, that is an inverse computation, which is costly and not always needed. For instance, if you compute x=A−1b, it's often better to solve Ax=b

Yes, I have inv here

    invSig = Sigma_prior^-1
    V = (invSig + sig2_d_vec[1]^(-1)*(X'*X))^-1
    C = cholesky(Hermitian(V)) 

I can think about dealing with this. Sigma_prior is a fixed matrix and never changes, so this inversion (this is currently a scalar) can be saved all the 7000 times. Regarding V, I need the precision matrix here only for its factorization later. There were some trick where I didn't need the inverse for that but I have to check.

In real code (this is a teaching example), in the sig2_d_vec[1]^(-1)*(X'*X) part, sig2_d_vec is, in fact, a huge matrix, say Sigma, so I have Sigma\(X'*X). Btw, X'*X doesn't change so this multiplication can be just done once an fed into the function.

@borisblagov
Copy link
Owner

I am a bit confused how to read the flame graph. Say this part:
image

If the lines underneath are "one level down", I spent 70% in gibbs and 9 % in gibbs (the two bars on the top row correspond to 70 and 9), then out of the 70% in gibbs I spent 25+26+9 times in genBeta! and 9% in genSigma!. Is that reading of the profiler correct?

@gdalle
Copy link
Author

gdalle commented Jun 17, 2023

Now I am utterly confused. I am getting this graph by typing in @profview include("MainNewB.jl") smile but for that to work (otherwise I got a white screen), I did "import ProfileView.

The import shouldn't be necessary. And indeed I have recently been confused by this blank screen, but if you run the line with @profview a second time I found it usually works.
https://discourse.julialang.org/t/profiling-in-vscode-shows-blank-tab-on-first-run/100492

@gdalle
Copy link
Author

gdalle commented Jun 17, 2023

If the lines underneath are "one level down", I spent 70% in gibbs and 9 % in gibbs

No, you spent 70% in a certain line of gibbs and 9% in another line of gibbs. You can see the line numbers by hovering with your mouse over the tiles, and if you do Ctrl+click it takes you to the file in question

@borisblagov
Copy link
Owner

I see, thanks!

I removed unnecessary repeated computations, namely X'*X, X'*Y, invSig = Sigma_prior^-1 and invSig*Beta_prior, which reduced the time by 30%!

new genBeta!

    V = (invSig + sig2_d_vec[1]^(-1)*Xprim)^-1
    C = cholesky(Hermitian(V)) 
    Beta1 =  V*(invSBetaPr + sig2_d_vec[1]^(-1)*XprimY)
    beta_d .= Beta1 .+ C.L*randn(5,1)

new time:

julia> @btime include("MainNewB.jl");
  27.985 ms (139970 allocations: 49.22 MiB)

old time:

julia> @btime include("MainNewB.jl")
  41.813 ms (160966 allocations: 52.21 MiB)

I am unsure I can reduce this further.

@gdalle
Copy link
Author

gdalle commented Jun 18, 2023

I am unsure I can reduce this further.

Hard to say without profiling myself, but here are some things I noticed:

V = (X + Y^(-1) * Z)^-1

When you do this, you allocate a new matrix for Y^(-1), then a second one for Y^(-1) * Z, and a third one for X + Y^(-1) * Z, and then finally a fourth one for (X + Y^(-1) * Z)^-1.
You could for instance keep temporary storage and use mul! + inv! every time, but that would make the code much harder to read.

sig2_d_vec[1]^(-1)

You compute this twice.

Beta1 = V*(invSBetaPr + sig2_d_vec[1]^(-1)*XprimY)

This is an instance of doing $x = A^{-1}b$ instead of solving $Ax = b$. Not sure there is a way to exploit it in your case because you seem to need the Cholesky factor of $A^{-1}$.

C = cholesky(Hermitian(V))

Apparently there is a way tu mutualize computations between Cholesky decomposition and matrix inversion, but I'm not well-versed in linear algebra to help

https://en.wikipedia.org/wiki/Cholesky_decomposition#Matrix_inversion

@borisblagov
Copy link
Owner

borisblagov commented Jun 19, 2023

Apparently there is a way tu mutualize computations between Cholesky decomposition and matrix inversion, but I'm not well-versed in linear algebra to help

Hmm, I know of a method, where given a normal distribution with mean $\mu$ and variance $K^{-1}$ you may draw from it by using the Cholesky factor of $K$, so that you don't have to invert $K$. You do have to invert solve with the Cholesky factor, which is cheaper (step 2. of the algorithm-the backward substitution). The original paper is this one
image

The source of the linke in wikipedia your provided sounds similar. This part is just after equation 24 on the second page
image

I never thought about using those algorithms for small problems, I can surely give it a try.

Not sure there is a way to exploit it in your case because you seem to need the Cholesky factor of $А^-1$

If I use the above trick to not having to compute Vinv, I could also solve $Ax =b$ instead of taking the inverse...

@borisblagov
Copy link
Owner

borisblagov commented Jun 19, 2023

Well, that did improve!

julia> @btime include("MainNewB.jl")
  22.101 ms (132948 allocations: 35.15 MiB)

versus

julia> @btime include("MainNewB.jl")
  29.781 ms (153948 allocations: 54.81 MiB)

We started at 80ms, then went to 40ms, then to 30ms, now this.

Here is what I changed

function genBeta2!(beta_d,invSig,sig2_d_vec,Xprim,XprimY,invSBetaPr)
    sig2inv = sig2_d_vec[1]^(-1)
    V = invSig + sig2inv*Xprim
    cholV = cholesky(Hermitian(V))
    beta_d .= V\(invSBetaPr + sig2inv*XprimY) + cholV.L\randn(5,1)
#    Vinv = (V)^-1
 #   C = cholesky(Hermitian(Vinv)) 
#    Beta1 =  Vinv*(invSBetaPr + sig2inv*XprimY)
#    beta_d .= Beta1 .+ C.L*randn(5,1)
end

I am using the aforementioned algorithm of Chan and Jeliazkov (2009) to draw $\beta_d$ using $V$ instead of calculating first the inverse, $V^{-1}$. Calculating the mean Beta1, is now done with backward substitution (if the \ operator does the same thing in Julia as it does in Matlab) as in V\(invSBetaPr + sig2inv*XprimY). I am unsure if V\ is avoidable/unavoidable.

@gdalle
Copy link
Author

gdalle commented Jun 19, 2023

See, we're getting somewhere :) Time to profile again, what's the most costly part?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants