-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use tar-format to encapsulate data #4960
Conversation
c72d85a
to
24dc369
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I have some comments to the last commit
include/storage/serialization.hpp
Outdated
|
||
namespace detail | ||
{ | ||
template <typename T, typename BlockT = unsigned char> | ||
inline unsigned char packBits(const T &data, std::size_t index, std::size_t count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return value must be BlockT
include/storage/serialization.hpp
Outdated
{ | ||
static_assert(std::is_same<typename T::value_type, bool>::value, "value_type is not bool"); | ||
const unsigned char mask = 1 << (count - 1); | ||
const BlockT mask = 1 << (count - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here must be BlockT{1} << (count - 1)
include/storage/serialization.hpp
Outdated
|
||
const auto decode = [&](const std::uint64_t block) { | ||
auto read_size = std::min<std::size_t>(count - index, WORD_BITS); | ||
unpackBits(data, index, read_size, block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here unpackBits<VectorT, std::uint64_t>
include/storage/serialization.hpp
Outdated
|
||
// FIXME on old boost version the function_input_iterator does not work with lambdas | ||
// so we need to wrap it in a function here. | ||
const std::function<std::uint64_t()> encode_function = [&]() -> char { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/-> char/-> std::uint64_t/
include/storage/serialization.hpp
Outdated
return packed; | ||
}; | ||
|
||
std::uint64_t number_of_blocks = std::ceil((double)count / WORD_BITS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(count + WORD_BITS - 1) / WORD_BITS
would be better to avoid floating-point arithmetic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice trick. I know a few places were this would come in useful.
3aa22d7
to
2707a2b
Compare
git-subtree-dir: third_party/microtar git-subtree-split: 956791770defa4d06696b30db276e88a09ad3538
2707a2b
to
8f75d0e
Compare
8f75d0e
to
3ee8a96
Compare
@TheMarex i pushed fixes for msvc to trigger CI builds ("fixes" are reducing c++ syntactic sugar for msvc). Feel free to kick-out or squash the last commit. |
Some more problems with windows. I suspect this is because I had to change the r-tree to allow writing to an mmap files. Maybe I can work around this. |
Wuhu finally! It compiles. 🚀 I need to merge this because we add the new dependency tree which doesn't work with rebase + merge. |
Follow-up to @TheMarex -s #2242 (comment) here One of #2242 requirements was "Only one final file". Does this implement it too. It's unclear form the description. I've failed to fish out from the commits if it is about tar-ifying individual files only or also about packaging into single master file. Will the docs or wiki:Processing Flow wiki updated on how to use the new format? |
@mloskot you are right, this doesn't meet the single-file requirement yet. However, since we now use named identifiers for data, |
@TheMarex It makes sense, thanks for the clarification. |
Issue
This PR is another step towards implementing #4952. The goal is to make any
.osrm.*
file atar
file internally and address data by (file)name. As example the.mldgr
file now looks like this:Where every file contains raw binary data for data structures in OSRM that we can deserialize or read directly to memory. The
.meta
files contain additional information for setting up the data layout, e.g. the number of values for serializedstd::vector
, or the fingerprint.Tasklist