Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Even faster storage handling #212

Merged
merged 41 commits into from
Feb 21, 2023
Merged

Even faster storage handling #212

merged 41 commits into from
Feb 21, 2023

Conversation

mateuszbaran
Copy link
Member

These are the changes we've discussed yesterday. Right now a bit WIP still but with the fused retractions my CG example is down to about 6 us (from about 80 us IIRC) and looks relatively solid now.

@mateuszbaran mateuszbaran added enhancement WIP Work in Progress (for a pull request) labels Jan 31, 2023
src/plans/solver_state.jl Outdated Show resolved Hide resolved
src/plans/solver_state.jl Outdated Show resolved Hide resolved
Copy link
Member

@kellertuer kellertuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks super!

I notier after our discussion that this way the CG coefficients are even more similar to the QuasiNewton rules we have, which I think is super nice.

I just have a few first comments, but mainly usability ideas.

src/plans/conjugate_gradient_plan.jl Outdated Show resolved Hide resolved
src/plans/conjugate_gradient_plan.jl Outdated Show resolved Hide resolved
src/plans/conjugate_gradient_plan.jl Outdated Show resolved Hide resolved
src/plans/conjugate_gradient_plan.jl Outdated Show resolved Hide resolved
src/plans/conjugate_gradient_plan.jl Outdated Show resolved Hide resolved
src/plans/conjugate_gradient_plan.jl Outdated Show resolved Hide resolved
src/plans/conjugate_gradient_plan.jl Show resolved Hide resolved
src/plans/solver_state.jl Show resolved Hide resolved
src/plans/solver_state.jl Show resolved Hide resolved
src/solvers/conjugate_gradient_descent.jl Outdated Show resolved Hide resolved
@kellertuer
Copy link
Member

Oh, and one thing that I did notice, the storage until now was also used in DebugActions if I remember correctly so is it possible to keep the storage that general? If it is just a storage for CG coefficient-usage then we can also store 2 variables in the coefficients, since we will always only have one coefficients and there is no need for a shared storage. But maybe this is also just a Naming-Thing for the new storage and it can already be used in the other places?

@mateuszbaran
Copy link
Member Author

I've made the storage changes reusable in other parts such as debug actions, so feel free to use these capabilities elsewhere as well. Just storing two or three variables in coefficient storage would be much easier to code but IIRC you wanted to avoid that.

@kellertuer
Copy link
Member

Maybe I am mainly arguing about the name the storage then currently has? For now it implements the general (i.e. hopefully reusable) case but is called directionUpdateStorage which (in name at least, not sure for the rest) seems to restrict the storage to CG and maybe average/Nesterov

@mateuszbaran
Copy link
Member Author

Yes, DirectionUpdateStorage is CG-specific but holds very little logic in itself. All new reusable parts are in StoreStateAction.

@mateuszbaran
Copy link
Member Author

Maybe we could extend DirectionUpdateStorage to something else but I haven't checked that yet. Anyway that looks like an internal change that could also be done later.

@kellertuer
Copy link
Member

Yes, DirectionUpdateStorage is CG-specific but holds very little logic in itself. All new reusable parts are in StoreStateAction.

Ah, now I see, then maybe I misunderstood the code this morning. You are right, the DUS is just a nice encapsulation to keep the “oldCG-types” simple (and now even without own storage) and the main change is in the storage. I somehow read that differently this morning. Then all is fine :)

@mateuszbaran
Copy link
Member Author

Ah, now I see, then maybe I misunderstood the code this morning. You are right, the DUS is just a nice encapsulation to keep the “oldCG-types” simple (and now even without own storage) and the main change is in the storage. I somehow read that differently this morning. Then all is fine :)

Yes, this is correct 👍 .

@codecov
Copy link

codecov bot commented Feb 1, 2023

Codecov Report

Merging #212 (f4172a0) into master (abedeed) will decrease coverage by 0.20%.
The diff coverage is 99.31%.

@@            Coverage Diff             @@
##           master     #212      +/-   ##
==========================================
- Coverage   99.91%   99.72%   -0.20%     
==========================================
  Files          61       61              
  Lines        4960     5107     +147     
==========================================
+ Hits         4956     5093     +137     
- Misses          4       14      +10     
Impacted Files Coverage Δ
src/Manopt.jl 100.00% <ø> (ø)
src/plans/primal_dual_plan.jl 100.00% <ø> (ø)
...rc/solvers/truncated_conjugate_gradient_descent.jl 100.00% <ø> (ø)
src/plans/stepsize.jl 95.39% <89.47%> (-4.61%) ⬇️
src/plans/conjugate_gradient_plan.jl 100.00% <100.00%> (ø)
src/plans/debug.jl 100.00% <100.00%> (ø)
src/plans/gradient_plan.jl 100.00% <100.00%> (ø)
src/plans/quasi_newton_plan.jl 100.00% <100.00%> (ø)
src/plans/record.jl 100.00% <100.00%> (ø)
src/plans/solver_state.jl 100.00% <100.00%> (ø)
... and 6 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mateuszbaran
Copy link
Member Author

I think this is more or less ready. This work could be continued for other parts of Manopt but for this PR I think these changes are more or less sufficient.

My last measurement of Rosenbrock benchmark is below 6 us and about half of all time is performing some actual computations. There is still at least one significant avoidable allocation (in line search) and some potentially avoidable copying but we are entering diminishing returns territory here. I'm quite happy with the overall improvement.

BTW, I've quickly checked peformence of quasi-Newton for Rosenbrock and the largest issue is too many allocations. There should be buffers that persist between iterations instead of constantly allocating for everything.

@kellertuer
Copy link
Member

kellertuer commented Feb 1, 2023

I think this is more or less ready. This work could be continued for other parts of Manopt but for this PR I think these changes are more or less sufficient.

That reads for me a bit like this is half a PR, since it leave the rest – probably to me – somewhen after I both have understood how this works and I find time to work through all other storages.
I would prefer to be sure that this last point actually worked and we did not miss anything. And also deprecate older constructors might help.

My last measurement of Rosenbrock benchmark is below 6 us and about half of all time is performing some actual computations. There is still at least one significant avoidable allocation (in line search) and some potentially avoidable copying but we are entering diminishing returns territory here. I'm quite happy with the overall improvement.

line_search Is luckily just one function, but I had designed it to have as less as possible allocations already. So input on which is avoidable and how would be great, if you have spotted it already.

BTW, I've quickly checked peformence of quasi-Newton for Rosenbrock and the largest issue is too many allocations. There should be buffers that persist between iterations instead of constantly allocating for everything.

Again, more detail here would be great. I am not an expert on allocations, so just saying – there is something wrong/ inefficient – sure can be done, but then it stays inefficient until someone more clever than me comes along. Also it does definelty not allocate “everything”, that reads like we never reuse anything, which is at least exaggerating.

@mateuszbaran
Copy link
Member Author

That reads for me a bit like this is half a PR, since it leave the rest – probably to me – somewhen after I both have understood how this works and I find time to work through all other storages.

This PR fully handles CG which I originally intended but then I noticed that similar changes could be applied in many other places. This is probably more like a quarter of what could be improved about storage. I can go back to that later but I think finishing the fused retraction rework may be more important.

I would prefer to be sure that this last point actually worked and we did not miss anything. And also deprecate older constructors might help.

Which last point do you mean?
I will take a look at what needs a deprecation.

line_search Is luckily just one function, but I had designed it to have as less as possible allocations already. So input on which is avoidable and how would be great, if you have spotted it already.

This is a topic worth a separate issue and also involves functions like

(a::WolfePowellLinesearch)(
    mp::AbstractManoptProblem,
    ams::AbstractManoptSolverState,
    ::Int,
    η=-get_gradient(mp, get_iterate(ams));
    kwargs...,
)

Again, more detail here would be great. I am not an expert on allocations, so just saying – there is something wrong/ inefficient – sure can be done, but then it stays inefficient until someone more clever than me comes along. Also it does definelty not allocate “everything”, that reads like we never reuse anything, which is at least exaggerating.

Yes, sorry, I was exaggerating. Look at the flame graph below though:
image
The ten orange blocks in red curves correspond to allocations performed by quasi_Newton solver each step that could be preallocated, so there is still quite a bit to do. I can point at specific lines that cause these allocations but maybe let's not put too much unrelated things here.

Maybe the thing to consider would be making a single storage for a solver that could be shared by perform_step!, line searches, coefficient calculations, stopping criteria, etc.

@kellertuer
Copy link
Member

Yes, I am sure you did not mean to exaggerate, I was also a bit tired yesterday and maybe overinterpretet.

For an action plan. What about

  1. We put the Armijo-Issue into a separate Issue. I have an idea where that is from, the linesearch_backtrack that a few linesearches share might allocate
  2. We put quasi Newton into its own issue. That sub solver does not yet align with the new subsolver-model/structure we developed last autumn, especially to set parameters in the sub solver is something that currently might allocate.
  3. You could first finish the fused retractions, that leaves this PR open for a while and I could
  4. check on this PR that the new storage works in the other places as well, after I understood how the fields work, but I then can also help documenting that. If it does not, we would probably need to do changes in this PR anyways. If it does, we can also check the other places and change the storage usage to the more efficient one.

But I not only have the other two PR I am currently not finding much time for, I also have at least 3 additional meetings the next few weeks within a collegial pedagogical mandatory coaching all “new” associate professorsI have to do (besides the language courses). So it might be that step 4 only happens somewhen in March, since currently I only find time to code late in the evenings due to that.

@mateuszbaran
Copy link
Member Author

Yes, I am sure you did not mean to exaggerate, I was also a bit tired yesterday and maybe overinterpretet.

Yes, sorry, I just meant that there are many places where avoidable allocations happen.

  1. We put the Armijo-Issue into a separate Issue. I have an idea where that is from, the linesearch_backtrack that a few linesearches share might allocate

I can make a list of all lines that cause allocation so that's easier to track.

2. We put quasi Newton into its own issue. That sub solver does not yet align with the new subsolver-model/structure we developed last autumn, especially to set parameters in the sub solver is something that currently might allocate.

Sure, I will open an issue.

3. You could first finish the fused retractions, that leaves this PR open for a while and I could

OK, this PR is just an internal change that doesn't block any other work so it can wait.

4. check on this PR that the new storage works in the other places as well, after I understood how the fields work, but I then can also help documenting that. If it does not, we would probably need to do changes in this PR anyways. If it does, we can also check the other places and change the storage usage to the more efficient one.

OK, I think the main issue I need your help with is that each place that wants to store points in the fast storage needs a point as a "template", and similarly for tangent vectors. This requires some redesign unfortunately. Once I have a point or vector at the place creating a storage, I can easily convert it to the fast variant.

But I not only have the other two PR I am currently not finding much time for, I also have at least 3 additional meetings the next few weeks within a collegial pedagogical mandatory coaching all “new” associate professorsI have to do (besides the language courses). So it might be that step 4 only happens somewhen in March, since currently I only find time to code late in the evenings due to that.

No problem, not having this PR merged doesn't block any other work I'm going to do over the next couple of months so that's fine.

@kellertuer
Copy link
Member

Cool, that sounds like a plan.
For the records that should be easy, the debugs, e.g. DebugChange or so might need an update for worst of cases a breaking one, but I think one can trick that, keeping the old slow Dictionary one as a fallback with a warning/deprecation.

@mateuszbaran
Copy link
Member Author

Yes, I will add a test for that line. I also need to take a closer look at performance again.

@kellertuer
Copy link
Member

Perfect, then we just have to add a small comment in the (now new) Changeling as well.

@kellertuer
Copy link
Member

kellertuer commented Feb 19, 2023

I had to add my two PRs of the next version anyways, so we just have to add the date when we merge.

edit: Oh I just saw we are a little bit inconsistent and sometimes use vector and sometimes tangent for tangent vectors in the storage now. Could we unify that to vector?

@mateuszbaran
Copy link
Member Author

OK, I will change tangent to vector. The new changelog looks nice 🙂 .

@kellertuer
Copy link
Member

Besides the name (you called it News.md) it is heavily inspired by your approach. I think for breaking changes especially it is important to document what broke and how to fix it.
Same for our other packages, let‘s introduce that step by step.

@mateuszbaran mateuszbaran marked this pull request as ready for review February 19, 2023 19:59
@mateuszbaran mateuszbaran removed the WIP Work in Progress (for a pull request) label Feb 19, 2023
@kellertuer
Copy link
Member

Hm, the cleanup might have caused a bit of code coverage loss. Maybe the Nonmonotonelinesearch does not have a proper storage for the case that is not covered (and hence stops early with no change)? I will try to check your recent changes for that maybe the early evening (sorry busy until then).

@mateuszbaran
Copy link
Member Author

I think I know what might be the problem.

@kellertuer
Copy link
Member

Nice fix! I am not 100% sure we use PointStorageKey and VectorStorageKey already in all places where they are necessary.

@mateuszbaran
Copy link
Member Author

Thanks! Indeed, they are not used in all places where they should be. I will adapt all places except primal_dual_plan.jl where I'm not sure about it.

@kellertuer
Copy link
Member

Sure, primal-dual plan can stay “slow” for now, since it is quite a bit of work required there which we can do later.

Copy link
Member

@kellertuer kellertuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this looks nice now, thanks for all the work.

Before merging, just write a date behind 0.4.8 in the Changelog.

I will write a tutorial that hopefully illustrates the benefits of using storage and even the fast storage as well as how to use them.

@mateuszbaran
Copy link
Member Author

Thanks, I'm still going to test this storage a bit more and make an example that illustrates its benefits.

@kellertuer kellertuer merged commit 64075b4 into master Feb 21, 2023
@kellertuer kellertuer deleted the mbaran/even-faster-storage branch December 16, 2023 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants