Even faster storage handling #212

mateuszbaran · 2023-01-31T21:16:34Z

These are the changes we've discussed yesterday. Right now a bit WIP still but with the fused retractions my CG example is down to about 6 us (from about 80 us IIRC) and looks relatively solid now.

src/plans/solver_state.jl

kellertuer

This looks super!

I notier after our discussion that this way the CG coefficients are even more similar to the QuasiNewton rules we have, which I think is super nice.

I just have a few first comments, but mainly usability ideas.

src/plans/conjugate_gradient_plan.jl

src/plans/solver_state.jl

src/solvers/conjugate_gradient_descent.jl

kellertuer · 2023-02-01T06:30:53Z

Oh, and one thing that I did notice, the storage until now was also used in DebugActions if I remember correctly so is it possible to keep the storage that general? If it is just a storage for CG coefficient-usage then we can also store 2 variables in the coefficients, since we will always only have one coefficients and there is no need for a shared storage. But maybe this is also just a Naming-Thing for the new storage and it can already be used in the other places?

mateuszbaran · 2023-02-01T14:15:19Z

I've made the storage changes reusable in other parts such as debug actions, so feel free to use these capabilities elsewhere as well. Just storing two or three variables in coefficient storage would be much easier to code but IIRC you wanted to avoid that.

kellertuer · 2023-02-01T14:17:32Z

Maybe I am mainly arguing about the name the storage then currently has? For now it implements the general (i.e. hopefully reusable) case but is called directionUpdateStorage which (in name at least, not sure for the rest) seems to restrict the storage to CG and maybe average/Nesterov

mateuszbaran · 2023-02-01T14:20:27Z

Yes, DirectionUpdateStorage is CG-specific but holds very little logic in itself. All new reusable parts are in StoreStateAction.

mateuszbaran · 2023-02-01T14:21:38Z

Maybe we could extend DirectionUpdateStorage to something else but I haven't checked that yet. Anyway that looks like an internal change that could also be done later.

kellertuer · 2023-02-01T14:24:47Z

Yes, DirectionUpdateStorage is CG-specific but holds very little logic in itself. All new reusable parts are in StoreStateAction.

Ah, now I see, then maybe I misunderstood the code this morning. You are right, the DUS is just a nice encapsulation to keep the “oldCG-types” simple (and now even without own storage) and the main change is in the storage. I somehow read that differently this morning. Then all is fine :)

mateuszbaran · 2023-02-01T14:27:51Z

Ah, now I see, then maybe I misunderstood the code this morning. You are right, the DUS is just a nice encapsulation to keep the “oldCG-types” simple (and now even without own storage) and the main change is in the storage. I somehow read that differently this morning. Then all is fine :)

Yes, this is correct 👍 .

codecov · 2023-02-01T15:23:08Z

Codecov Report

Merging #212 (f4172a0) into master (abedeed) will decrease coverage by 0.20%.
The diff coverage is 99.31%.

@@            Coverage Diff             @@
##           master     #212      +/-   ##
==========================================
- Coverage   99.91%   99.72%   -0.20%     
==========================================
  Files          61       61              
  Lines        4960     5107     +147     
==========================================
+ Hits         4956     5093     +137     
- Misses          4       14      +10

Impacted Files	Coverage Δ
src/Manopt.jl	`100.00% <ø> (ø)`
src/plans/primal_dual_plan.jl	`100.00% <ø> (ø)`
...rc/solvers/truncated_conjugate_gradient_descent.jl	`100.00% <ø> (ø)`
src/plans/stepsize.jl	`95.39% <89.47%> (-4.61%)`	⬇️
src/plans/conjugate_gradient_plan.jl	`100.00% <100.00%> (ø)`
src/plans/debug.jl	`100.00% <100.00%> (ø)`
src/plans/gradient_plan.jl	`100.00% <100.00%> (ø)`
src/plans/quasi_newton_plan.jl	`100.00% <100.00%> (ø)`
src/plans/record.jl	`100.00% <100.00%> (ø)`
src/plans/solver_state.jl	`100.00% <100.00%> (ø)`
... and 6 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

mateuszbaran · 2023-02-01T21:06:33Z

I think this is more or less ready. This work could be continued for other parts of Manopt but for this PR I think these changes are more or less sufficient.

My last measurement of Rosenbrock benchmark is below 6 us and about half of all time is performing some actual computations. There is still at least one significant avoidable allocation (in line search) and some potentially avoidable copying but we are entering diminishing returns territory here. I'm quite happy with the overall improvement.

BTW, I've quickly checked peformence of quasi-Newton for Rosenbrock and the largest issue is too many allocations. There should be buffers that persist between iterations instead of constantly allocating for everything.

kellertuer · 2023-02-01T21:31:23Z

I think this is more or less ready. This work could be continued for other parts of Manopt but for this PR I think these changes are more or less sufficient.

That reads for me a bit like this is half a PR, since it leave the rest – probably to me – somewhen after I both have understood how this works and I find time to work through all other storages.
I would prefer to be sure that this last point actually worked and we did not miss anything. And also deprecate older constructors might help.

My last measurement of Rosenbrock benchmark is below 6 us and about half of all time is performing some actual computations. There is still at least one significant avoidable allocation (in line search) and some potentially avoidable copying but we are entering diminishing returns territory here. I'm quite happy with the overall improvement.

line_search Is luckily just one function, but I had designed it to have as less as possible allocations already. So input on which is avoidable and how would be great, if you have spotted it already.

BTW, I've quickly checked peformence of quasi-Newton for Rosenbrock and the largest issue is too many allocations. There should be buffers that persist between iterations instead of constantly allocating for everything.

Again, more detail here would be great. I am not an expert on allocations, so just saying – there is something wrong/ inefficient – sure can be done, but then it stays inefficient until someone more clever than me comes along. Also it does definelty not allocate “everything”, that reads like we never reuse anything, which is at least exaggerating.

mateuszbaran · 2023-02-01T22:20:11Z

That reads for me a bit like this is half a PR, since it leave the rest – probably to me – somewhen after I both have understood how this works and I find time to work through all other storages.

This PR fully handles CG which I originally intended but then I noticed that similar changes could be applied in many other places. This is probably more like a quarter of what could be improved about storage. I can go back to that later but I think finishing the fused retraction rework may be more important.

I would prefer to be sure that this last point actually worked and we did not miss anything. And also deprecate older constructors might help.

Which last point do you mean?
I will take a look at what needs a deprecation.

line_search Is luckily just one function, but I had designed it to have as less as possible allocations already. So input on which is avoidable and how would be great, if you have spotted it already.

This is a topic worth a separate issue and also involves functions like

(a::WolfePowellLinesearch)(
    mp::AbstractManoptProblem,
    ams::AbstractManoptSolverState,
    ::Int,
    η=-get_gradient(mp, get_iterate(ams));
    kwargs...,
)

Again, more detail here would be great. I am not an expert on allocations, so just saying – there is something wrong/ inefficient – sure can be done, but then it stays inefficient until someone more clever than me comes along. Also it does definelty not allocate “everything”, that reads like we never reuse anything, which is at least exaggerating.

Yes, sorry, I was exaggerating. Look at the flame graph below though:

The ten orange blocks in red curves correspond to allocations performed by quasi_Newton solver each step that could be preallocated, so there is still quite a bit to do. I can point at specific lines that cause these allocations but maybe let's not put too much unrelated things here.

Maybe the thing to consider would be making a single storage for a solver that could be shared by perform_step!, line searches, coefficient calculations, stopping criteria, etc.

kellertuer · 2023-02-02T05:31:25Z

Yes, I am sure you did not mean to exaggerate, I was also a bit tired yesterday and maybe overinterpretet.

For an action plan. What about

We put the Armijo-Issue into a separate Issue. I have an idea where that is from, the linesearch_backtrack that a few linesearches share might allocate
We put quasi Newton into its own issue. That sub solver does not yet align with the new subsolver-model/structure we developed last autumn, especially to set parameters in the sub solver is something that currently might allocate.
You could first finish the fused retractions, that leaves this PR open for a while and I could
check on this PR that the new storage works in the other places as well, after I understood how the fields work, but I then can also help documenting that. If it does not, we would probably need to do changes in this PR anyways. If it does, we can also check the other places and change the storage usage to the more efficient one.

But I not only have the other two PR I am currently not finding much time for, I also have at least 3 additional meetings the next few weeks within a collegial pedagogical mandatory coaching all “new” associate professorsI have to do (besides the language courses). So it might be that step 4 only happens somewhen in March, since currently I only find time to code late in the evenings due to that.

mateuszbaran · 2023-02-02T07:55:05Z

Yes, I am sure you did not mean to exaggerate, I was also a bit tired yesterday and maybe overinterpretet.

Yes, sorry, I just meant that there are many places where avoidable allocations happen.

We put the Armijo-Issue into a separate Issue. I have an idea where that is from, the linesearch_backtrack that a few linesearches share might allocate

I can make a list of all lines that cause allocation so that's easier to track.

2. We put quasi Newton into its own issue. That sub solver does not yet align with the new subsolver-model/structure we developed last autumn, especially to set parameters in the sub solver is something that currently might allocate.

Sure, I will open an issue.

3. You could first finish the fused retractions, that leaves this PR open for a while and I could

OK, this PR is just an internal change that doesn't block any other work so it can wait.

4. check on this PR that the new storage works in the other places as well, after I understood how the fields work, but I then can also help documenting that. If it does not, we would probably need to do changes in this PR anyways. If it does, we can also check the other places and change the storage usage to the more efficient one.

OK, I think the main issue I need your help with is that each place that wants to store points in the fast storage needs a point as a "template", and similarly for tangent vectors. This requires some redesign unfortunately. Once I have a point or vector at the place creating a storage, I can easily convert it to the fast variant.

But I not only have the other two PR I am currently not finding much time for, I also have at least 3 additional meetings the next few weeks within a collegial pedagogical mandatory coaching all “new” associate professorsI have to do (besides the language courses). So it might be that step 4 only happens somewhen in March, since currently I only find time to code late in the evenings due to that.

No problem, not having this PR merged doesn't block any other work I'm going to do over the next couple of months so that's fine.

kellertuer · 2023-02-02T08:53:11Z

Cool, that sounds like a plan.
For the records that should be easy, the debugs, e.g. DebugChange or so might need an update for worst of cases a breaking one, but I think one can trick that, keeping the old slow Dictionary one as a fallback with a warning/deprecation.

src/plans/solver_state.jl

src/plans/conjugate_gradient_plan.jl

mateuszbaran · 2023-02-19T18:02:42Z

Yes, I will add a test for that line. I also need to take a closer look at performance again.

kellertuer · 2023-02-19T18:07:58Z

Perfect, then we just have to add a small comment in the (now new) Changeling as well.

…s/Manopt.jl into mbaran/even-faster-storage

kellertuer · 2023-02-19T18:27:37Z

I had to add my two PRs of the next version anyways, so we just have to add the date when we merge.

edit: Oh I just saw we are a little bit inconsistent and sometimes use vector and sometimes tangent for tangent vectors in the storage now. Could we unify that to vector?

mateuszbaran · 2023-02-19T18:31:24Z

OK, I will change tangent to vector. The new changelog looks nice 🙂 .

kellertuer · 2023-02-19T18:41:15Z

Besides the name (you called it News.md) it is heavily inspired by your approach. I think for breaking changes especially it is important to document what broke and how to fix it.
Same for our other packages, let‘s introduce that step by step.

kellertuer · 2023-02-20T05:59:04Z

Hm, the cleanup might have caused a bit of code coverage loss. Maybe the Nonmonotonelinesearch does not have a proper storage for the case that is not covered (and hence stops early with no change)? I will try to check your recent changes for that maybe the early evening (sorry busy until then).

mateuszbaran · 2023-02-20T07:46:44Z

I think I know what might be the problem.

kellertuer · 2023-02-20T10:43:51Z

Nice fix! I am not 100% sure we use PointStorageKey and VectorStorageKey already in all places where they are necessary.

mateuszbaran · 2023-02-20T10:51:13Z

Thanks! Indeed, they are not used in all places where they should be. I will adapt all places except primal_dual_plan.jl where I'm not sure about it.

kellertuer · 2023-02-20T10:52:29Z

Sure, primal-dual plan can stay “slow” for now, since it is quite a bit of work required there which we can do later.

kellertuer

For me this looks nice now, thanks for all the work.

Before merging, just write a date behind 0.4.8 in the Changelog.

I will write a tutorial that hopefully illustrates the benefits of using storage and even the fast storage as well as how to use them.

mateuszbaran · 2023-02-20T11:30:42Z

Thanks, I'm still going to test this storage a bit more and make an example that illustrates its benefits.

src/solvers/quasi_Newton.jl

src/plans/quasi_newton_plan.jl

mateuszbaran added 3 commits January 31, 2023 22:01

Even faster storage handling

b1c53a1

minor improvement

cf192bd

cleanup and asserts

2a319d8

mateuszbaran added enhancement WIP Work in Progress (for a pull request) labels Jan 31, 2023

github-actions bot reviewed Jan 31, 2023

View reviewed changes

src/plans/solver_state.jl Outdated Show resolved Hide resolved

src/plans/solver_state.jl Outdated Show resolved Hide resolved

kellertuer reviewed Feb 1, 2023

View reviewed changes

mateuszbaran mentioned this pull request Feb 1, 2023

Convert storage-using stopping criteria to faster storage handling #214

Closed

fixing things and addressing code review

dba15cb

mateuszbaran added 3 commits February 1, 2023 21:37

some general improvements and docs

284782a

a bit more docs and type bounds

e9b3d17

unify constructors for CG coefficients

85dc5f8

kellertuer reviewed Feb 3, 2023

View reviewed changes

src/plans/solver_state.jl Outdated Show resolved Hide resolved

kellertuer reviewed Feb 3, 2023

View reviewed changes

src/plans/solver_state.jl Outdated Show resolved Hide resolved

kellertuer reviewed Feb 3, 2023

View reviewed changes

src/plans/solver_state.jl Outdated Show resolved Hide resolved

kellertuer reviewed Feb 3, 2023

View reviewed changes

src/plans/conjugate_gradient_plan.jl Outdated Show resolved Hide resolved

Adds a changelog.

136b0a0

mateuszbaran and others added 3 commits February 19, 2023 19:22

faster storage update; test one edge case

2fb8a32

Update Changelog.

9341816

Merge branch 'mbaran/even-faster-storage' of github.com:JuliaManifold…

703acf3

…s/Manopt.jl into mbaran/even-faster-storage

small cleanup

568c51f

mateuszbaran marked this pull request as ready for review February 19, 2023 19:59

mateuszbaran removed the WIP Work in Progress (for a pull request) label Feb 19, 2023

fix nonmutating storage

ede0d6c

use more storage keys

6062387

kellertuer approved these changes Feb 20, 2023

View reviewed changes

kellertuer mentioned this pull request Feb 20, 2023

Unify ConjugateGradient update rule constructors #207

Closed

kellertuer and others added 2 commits February 20, 2023 19:25

add a few stati I missed in the last PR (Sorrey).

5f3e98d

a few small quasi Newton optimizations

eba6978

kellertuer reviewed Feb 21, 2023

View reviewed changes

src/solvers/quasi_Newton.jl Outdated Show resolved Hide resolved

kellertuer reviewed Feb 21, 2023

View reviewed changes

src/plans/quasi_newton_plan.jl Show resolved Hide resolved

mateuszbaran and others added 2 commits February 21, 2023 17:43

revert p_old storage in quasi Newton

a0c5bbb

Finalize 0.4.8

f4172a0

kellertuer merged commit 64075b4 into master Feb 21, 2023

kellertuer deleted the mbaran/even-faster-storage branch December 16, 2023 07:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Even faster storage handling #212

Even faster storage handling #212

mateuszbaran commented Jan 31, 2023

kellertuer left a comment

kellertuer commented Feb 1, 2023

mateuszbaran commented Feb 1, 2023

kellertuer commented Feb 1, 2023

mateuszbaran commented Feb 1, 2023

mateuszbaran commented Feb 1, 2023

kellertuer commented Feb 1, 2023

mateuszbaran commented Feb 1, 2023

codecov bot commented Feb 1, 2023 •

edited

Loading

mateuszbaran commented Feb 1, 2023

kellertuer commented Feb 1, 2023 •

edited

Loading

mateuszbaran commented Feb 1, 2023

kellertuer commented Feb 2, 2023

mateuszbaran commented Feb 2, 2023

kellertuer commented Feb 2, 2023

mateuszbaran commented Feb 19, 2023

kellertuer commented Feb 19, 2023

kellertuer commented Feb 19, 2023 •

edited

Loading

mateuszbaran commented Feb 19, 2023

kellertuer commented Feb 19, 2023

kellertuer commented Feb 20, 2023

mateuszbaran commented Feb 20, 2023

kellertuer commented Feb 20, 2023

mateuszbaran commented Feb 20, 2023

kellertuer commented Feb 20, 2023

kellertuer left a comment

mateuszbaran commented Feb 20, 2023

Even faster storage handling #212

Even faster storage handling #212

Conversation

mateuszbaran commented Jan 31, 2023

kellertuer left a comment

Choose a reason for hiding this comment

kellertuer commented Feb 1, 2023

mateuszbaran commented Feb 1, 2023

kellertuer commented Feb 1, 2023

mateuszbaran commented Feb 1, 2023

mateuszbaran commented Feb 1, 2023

kellertuer commented Feb 1, 2023

mateuszbaran commented Feb 1, 2023

codecov bot commented Feb 1, 2023 • edited Loading

Codecov Report

mateuszbaran commented Feb 1, 2023

kellertuer commented Feb 1, 2023 • edited Loading

mateuszbaran commented Feb 1, 2023

kellertuer commented Feb 2, 2023

mateuszbaran commented Feb 2, 2023

kellertuer commented Feb 2, 2023

mateuszbaran commented Feb 19, 2023

kellertuer commented Feb 19, 2023

kellertuer commented Feb 19, 2023 • edited Loading

mateuszbaran commented Feb 19, 2023

kellertuer commented Feb 19, 2023

kellertuer commented Feb 20, 2023

mateuszbaran commented Feb 20, 2023

kellertuer commented Feb 20, 2023

mateuszbaran commented Feb 20, 2023

kellertuer commented Feb 20, 2023

kellertuer left a comment

Choose a reason for hiding this comment

mateuszbaran commented Feb 20, 2023

codecov bot commented Feb 1, 2023 •

edited

Loading

kellertuer commented Feb 1, 2023 •

edited

Loading

kellertuer commented Feb 19, 2023 •

edited

Loading