fix Pose2 preambleCache bug #562

dehann · 2022-03-23T07:36:21Z

The idea here is to try reduce hot loop memory consumption.

So just quickly trying on hexagonal example...

Once upon a time, hex would take at best around 30s to solve... After this PR, and on a computer thermally throttling to around 2.8GHz, hex solves in less than 8 seconds. Usual caveats of all processes having JIT compiled etc. (that's using 5 processes using Distributed.jl). Haven't had time to test properly (i've been lazy with fg in global scope).

TLDR; i expect this PR to reduce the hot loop memory consumption. Proper testing will tell how much this PR helps, but it seems to be helping somewhere between a little towards a lot.

dehann · 2022-03-23T07:46:46Z

ah, should put q_hat in the cache as well... will try that tomorrow quick if i get a moment spare

codecov · 2022-03-23T08:10:04Z

Codecov Report

Merging #562 (840c3fe) into master (f9cc6e9) will decrease coverage by 0.06%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #562      +/-   ##
==========================================
- Coverage   32.84%   32.78%   -0.07%     
==========================================
  Files          48       48              
  Lines        1860     1870      +10     
==========================================
+ Hits          611      613       +2     
- Misses       1249     1257       +8

Impacted Files	Coverage Δ
src/factors/Pose2D.jl	`100.00% <100.00%> (ø)`
src/canonical/GenerateCircular.jl	`83.33% <0.00%> (-3.34%)`	⬇️
src/canonical/GenerateCommon.jl	`95.12% <0.00%> (-2.44%)`	⬇️
src/OdometryUtils.jl	`34.84% <0.00%> (-1.52%)`	⬇️
src/RobotUtils.jl	`6.27% <0.00%> (-0.12%)`	⬇️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

jim-hill-r · 2022-03-23T13:56:10Z

Excited for the performance gains! This looks like a side-by-side caching opposed to in-line caching. Just curious about the design choice there.

dehann · 2022-03-23T15:24:33Z

I need to learn more about side-by-side (in-place) vs in-line. I was searching online but didn't find a good reference yet. Do you perhaps have a good reference handy?

I think it's in-place (side-by-side) model, if i'm reading your input correctly, because CalcFactor.cache is likely going to do two related jobs. One is a strong in-place memory case. Maybe cache should be split for two cases, I started somewhere which seems like a sensible design.

Currently only PosePose2 factor is using preambleCache so far (this PR).

The word cache might be a bit confusing, since it will definitely be doing "in-place" memory work for hot loop computations. While the same architecture can also be used for caching important data at addFactor time.

The Marine Demo is a good example. I'm looking to find a good way of not serializing the radar sweeps into the PackedFactor data, and duplicating that for each factor. My thinking is the preambleCache step (which ScatterAlignPose2 will overload) can fetch big blobs from the data store instead.

Similarly, if there is a global calibration, we can load that kind of stuff for each of the factors when they get added to the graph.

EDIT: oh, i also like the simplicity of adding this for users. See the code changes for this PR -- it's really easy for a user. In-line model might also be easy, but i'd need to learn more first.

EDIT2: Also, i have multithreading in mind. That design is still a work in progress, but so far the leading candidate for
me is that a fresh CalcFactor object should be created for each new thread computation on a factor (at least one per particle/kernel) -- but at the same time, as many fields possible inside each new CalcFactor (CF) should come from existing memory inside the CCW or FMd designs. There is a whole other story on boiling down all the operational memory objects to just CCW and CF. There is actually a project board up for that already.

jim-hill-r · 2022-03-23T15:34:35Z

The difference is primarily on who is coordinating the cache. Client or cache. I am not sure the ubiquitous terms. I made those up. Concrete explanation of the two options:

Client asks cache for an object. If cache miss, then client asks primary resource. This is what I am calling "side-by-side". This requires every client to handle it. This is what it appears you are doing. Every factor needs to be "aware" of the cache. I do believe this pattern is how CPUs work with L1, L2, RAM, Disk, but I suspect that is really low level.
Client asks "cache" for an object. If cache miss, the cache knows where the origin resource is and just fetches it. The client doesn't need to know anything. This is how cache works in most web tech. It allows for multi-level caching without any client changes. This pattern also supports background hot-loading and cache-clearing. It's also very generic so new factors would benefit without even being aware.

dehann · 2022-03-23T16:43:22Z

Ah, got it thanks. So predominant use case here will be low level to give user maximum control. Each factor "cache" should be maximally flexible at this level. Users will want to do a whole gamut of tricks and performance tweaks. E.g. a requirement on the factor cache is to be able to leverage GPU.

My sense is that the factor cache will develop with both designs being used in different parts. In-line caching makes a lot of sense for the data stores.

I'm fairly sure CF.cache starting out as an in-place design is the right way to go. But then inside each factor's preambleCache function, it's likely that the user will be leveraging in-line caching features.

dehann · 2022-03-23T17:45:03Z

PS, I added this to the CJL collection of high level design decisions: https://github.com/JuliaRobotics/Caesar.jl/wiki/High-Level-Requirements

I'm now starting to think we should consolidate that with the RequirementsManagement system.

dehann added 2 commits March 15, 2022 20:20

CF.cache not being set?

d4e8141

fix Pose2 cache problem

f4d7610

dehann requested review from Affie, GearsAD and jim-hill-r March 23, 2022 07:36

dehann self-assigned this Mar 23, 2022

dehann added factors performance cache labels Mar 23, 2022

dehann added this to the v0.18.2 milestone Mar 23, 2022

dehann added the inference label Mar 23, 2022

jim-hill-r approved these changes Mar 23, 2022

View reviewed changes

dehann added the design label Mar 23, 2022

add q_hat/

840c3fe

dehann merged commit 8dbf19b into master Mar 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix Pose2 preambleCache bug #562

fix Pose2 preambleCache bug #562

dehann commented Mar 23, 2022 •

edited

Loading

dehann commented Mar 23, 2022 •

edited

Loading

codecov bot commented Mar 23, 2022 •

edited

Loading

jim-hill-r commented Mar 23, 2022

dehann commented Mar 23, 2022 •

edited

Loading

jim-hill-r commented Mar 23, 2022

dehann commented Mar 23, 2022

dehann commented Mar 23, 2022

fix Pose2 preambleCache bug #562

fix Pose2 preambleCache bug #562

Conversation

dehann commented Mar 23, 2022 • edited Loading

dehann commented Mar 23, 2022 • edited Loading

codecov bot commented Mar 23, 2022 • edited Loading

Codecov Report

jim-hill-r commented Mar 23, 2022

dehann commented Mar 23, 2022 • edited Loading

jim-hill-r commented Mar 23, 2022

dehann commented Mar 23, 2022

dehann commented Mar 23, 2022

dehann commented Mar 23, 2022 •

edited

Loading

dehann commented Mar 23, 2022 •

edited

Loading

codecov bot commented Mar 23, 2022 •

edited

Loading

dehann commented Mar 23, 2022 •

edited

Loading