Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix Pose2 preambleCache bug #562

Merged
merged 3 commits into from
Mar 24, 2022
Merged

fix Pose2 preambleCache bug #562

merged 3 commits into from
Mar 24, 2022

Conversation

dehann
Copy link
Member

@dehann dehann commented Mar 23, 2022

The idea here is to try reduce hot loop memory consumption.

So just quickly trying on hexagonal example...

Once upon a time, hex would take at best around 30s to solve... After this PR, and on a computer thermally throttling to around 2.8GHz, hex solves in less than 8 seconds. Usual caveats of all processes having JIT compiled etc. (that's using 5 processes using Distributed.jl). Haven't had time to test properly (i've been lazy with fg in global scope).

TLDR; i expect this PR to reduce the hot loop memory consumption. Proper testing will tell how much this PR helps, but it seems to be helping somewhere between a little towards a lot.

@dehann
Copy link
Member Author

dehann commented Mar 23, 2022

ah, should put q_hat in the cache as well... will try that tomorrow quick if i get a moment spare

@codecov
Copy link

codecov bot commented Mar 23, 2022

Codecov Report

Merging #562 (840c3fe) into master (f9cc6e9) will decrease coverage by 0.06%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #562      +/-   ##
==========================================
- Coverage   32.84%   32.78%   -0.07%     
==========================================
  Files          48       48              
  Lines        1860     1870      +10     
==========================================
+ Hits          611      613       +2     
- Misses       1249     1257       +8     
Impacted Files Coverage Δ
src/factors/Pose2D.jl 100.00% <100.00%> (ø)
src/canonical/GenerateCircular.jl 83.33% <0.00%> (-3.34%) ⬇️
src/canonical/GenerateCommon.jl 95.12% <0.00%> (-2.44%) ⬇️
src/OdometryUtils.jl 34.84% <0.00%> (-1.52%) ⬇️
src/RobotUtils.jl 6.27% <0.00%> (-0.12%) ⬇️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@jim-hill-r
Copy link
Collaborator

Excited for the performance gains! This looks like a side-by-side caching opposed to in-line caching. Just curious about the design choice there.

@dehann
Copy link
Member Author

dehann commented Mar 23, 2022

I need to learn more about side-by-side (in-place) vs in-line. I was searching online but didn't find a good reference yet. Do you perhaps have a good reference handy?

I think it's in-place (side-by-side) model, if i'm reading your input correctly, because CalcFactor.cache is likely going to do two related jobs. One is a strong in-place memory case. Maybe cache should be split for two cases, I started somewhere which seems like a sensible design.

Currently only PosePose2 factor is using preambleCache so far (this PR).

The word cache might be a bit confusing, since it will definitely be doing "in-place" memory work for hot loop computations. While the same architecture can also be used for caching important data at addFactor time.

The Marine Demo is a good example. I'm looking to find a good way of not serializing the radar sweeps into the PackedFactor data, and duplicating that for each factor. My thinking is the preambleCache step (which ScatterAlignPose2 will overload) can fetch big blobs from the data store instead.

Similarly, if there is a global calibration, we can load that kind of stuff for each of the factors when they get added to the graph.

EDIT: oh, i also like the simplicity of adding this for users. See the code changes for this PR -- it's really easy for a user. In-line model might also be easy, but i'd need to learn more first.

EDIT2: Also, i have multithreading in mind. That design is still a work in progress, but so far the leading candidate for
me is that a fresh CalcFactor object should be created for each new thread computation on a factor (at least one per particle/kernel) -- but at the same time, as many fields possible inside each new CalcFactor (CF) should come from existing memory inside the CCW or FMd designs. There is a whole other story on boiling down all the operational memory objects to just CCW and CF. There is actually a project board up for that already.

@jim-hill-r
Copy link
Collaborator

The difference is primarily on who is coordinating the cache. Client or cache. I am not sure the ubiquitous terms. I made those up. Concrete explanation of the two options:

  1. Client asks cache for an object. If cache miss, then client asks primary resource. This is what I am calling "side-by-side". This requires every client to handle it. This is what it appears you are doing. Every factor needs to be "aware" of the cache. I do believe this pattern is how CPUs work with L1, L2, RAM, Disk, but I suspect that is really low level.
  2. Client asks "cache" for an object. If cache miss, the cache knows where the origin resource is and just fetches it. The client doesn't need to know anything. This is how cache works in most web tech. It allows for multi-level caching without any client changes. This pattern also supports background hot-loading and cache-clearing. It's also very generic so new factors would benefit without even being aware.

@dehann
Copy link
Member Author

dehann commented Mar 23, 2022

Ah, got it thanks. So predominant use case here will be low level to give user maximum control. Each factor "cache" should be maximally flexible at this level. Users will want to do a whole gamut of tricks and performance tweaks. E.g. a requirement on the factor cache is to be able to leverage GPU.

My sense is that the factor cache will develop with both designs being used in different parts. In-line caching makes a lot of sense for the data stores.

I'm fairly sure CF.cache starting out as an in-place design is the right way to go. But then inside each factor's preambleCache function, it's likely that the user will be leveraging in-line caching features.

@dehann dehann added the design label Mar 23, 2022
@dehann
Copy link
Member Author

dehann commented Mar 23, 2022

PS, I added this to the CJL collection of high level design decisions: https://github.com/JuliaRobotics/Caesar.jl/wiki/High-Level-Requirements

I'm now starting to think we should consolidate that with the RequirementsManagement system.

@dehann dehann merged commit 8dbf19b into master Mar 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

2 participants