-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shared storage cache #4989
Shared storage cache #4989
Conversation
@oxidase I didn't make the cache a singleton since all the accesses to the cache are through the SearchEngine. Instead I got rid of the |
After talking with @danpat, I converted the cache pointer into a cache object: f39e37a. But tests were still failing non-determiistically. Then I talked to @miccolis and went with putting a lock in front of the call to clear the function from within the SearchEngine: 01e6562. The tests have stopped failing, but this probably makes it slower. Some conversation came up around whether to make this shared storage cache be inter-thread shared storage versus inter-process shared storage. This implementation is of inter-thread shared storage as of now. Decisions to be made after performance measurements between threadlocal vs inter-thread shared storage. Hopefully we don't have to venture into inter-process shared storage 😬 🤞 |
65e3f74
to
01e6562
Compare
One consideration for using a shared storage is that there can be multiple datasets (usually two) in flight at the same time. This is not a problem for thread-local caching, since a single thread will either only use the new or the old data. For a shared cache we need to keep track of the dataset used to compute every entry in the cache, that means we need to extend the cache line to include the timestamp. The full timestamp used by OSRM is a The cache design would need to be changed in the following way:
|
@TheMarex the steps above are really clever and adding |
You can do it the same way you currently handle the full timestamp value by making it part of the key. The generation logic can live in the interface that you build around the LRU cache. Seems like boost automatically generates a hash function for a |
|
7e28a9d
to
71892a5
Compare
923e6c0
to
eda5aa4
Compare
3bfdaf3
to
729a399
Compare
fac25d1
to
97cd4c7
Compare
Running osrm-runner with this command, adjusting lengh of request and number of requests as appropriate:
For threads, using the |
c7983bd
to
0470a4e
Compare
97cd4c7
to
951daed
Compare
copy dummy cache over implement retrievePackedPathFromSearchSpace calculate packed_path_from_source_to_middle debugging the retrievePackedPathFromSearchSpace function implementation adding in packed_path_from_source_to_middle cache is partway working unpack path and get duration that way the computeDurationForEdge method comment out cache clean up the code move vector creation and allocation to outside of loop hack to not return vectors on facade.GetUncompressedForwardDurations and facade.GetUncompressedReverseDurations clean up hack add exclude_index to cache key clearing cache with timestamp rebase against vectors->range pr swapped out unordered_map cache with a boost_lru implementation calculation for cache size cleaned up comment about cache size calculations unit tests cache uses unsigned char for exclude index clean up cache and unit tests
500 mb threadlocal 2 t
0470a4e
to
e4a4a8d
Compare
copy dummy cache over implement retrievePackedPathFromSearchSpace calculate packed_path_from_source_to_middle debugging the retrievePackedPathFromSearchSpace function implementation adding in packed_path_from_source_to_middle cache is partway working unpack path and get duration that way the computeDurationForEdge method comment out cache clean up the code move vector creation and allocation to outside of loop hack to not return vectors on facade.GetUncompressedForwardDurations and facade.GetUncompressedReverseDurations clean up hack add exclude_index to cache key clearing cache with timestamp rebase against vectors->range pr swapped out unordered_map cache with a boost_lru implementation calculation for cache size cleaned up comment about cache size calculations unit tests cache uses unsigned char for exclude index clean up cache and unit tests
copy dummy cache over implement retrievePackedPathFromSearchSpace calculate packed_path_from_source_to_middle debugging the retrievePackedPathFromSearchSpace function implementation adding in packed_path_from_source_to_middle cache is partway working unpack path and get duration that way the computeDurationForEdge method comment out cache clean up the code move vector creation and allocation to outside of loop hack to not return vectors on facade.GetUncompressedForwardDurations and facade.GetUncompressedReverseDurations clean up hack add exclude_index to cache key clearing cache with timestamp rebase against vectors->range pr swapped out unordered_map cache with a boost_lru implementation calculation for cache size cleaned up comment about cache size calculations unit tests cache uses unsigned char for exclude index clean up cache and unit tests
shared lock for reads and unique lock for writes declare cache as object and not pointer in serach engine data to simulate singleton declaration put a lock infront of the clear function to make it threadsafe remove clear function from cache because cache will never get dropped unit tests unit tests and timestamp as part of key cache generations hash the key 500 mb 1000 mb 250 mb rebase against implement-cache
951daed
to
c5acd6e
Compare
e59fdac
to
4e2f6b7
Compare
26122e3
to
62d7d08
Compare
Issue
Use a shared storage cache to compare against performance of the threadlocal cache in #4876. This PR is to test out which is faster and keep the implementation that is faster. Related to cache considerations detailed here.
Tasklist
Requirements / Relations
#4876