-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Towards splitting coordinates in separate buffers and files #1454
Conversation
06f4a05
to
534050a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a partial review of the header files and small changes in .cc files.
tiledb/sm/misc/utils.cc
Outdated
@@ -675,6 +675,20 @@ void expand_mbr(T* mbr, const T* coords, unsigned int dim_num) { | |||
} | |||
} | |||
|
|||
template <class T> | |||
void expand_mbr(T* mbr, const std::vector<T*>& coords, uint64_t pos) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll only leave one of these comments because it's a little pedantic, but we should value-const function parameters in the definitions (but not declarations) whenever possible for const correctness. So in this example, I would do T* const mbr
and const uint64_t pos
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mbr
is modified by the function, perhaps I should move it to the end of the list (that's the convention we've been following). Good point on pos
although every other place in the code must be changed similarly. We can start now :).
uint64_t original_buffer_var_size_; | ||
|
||
/** Constructor. */ | ||
CoordBuffer() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's changed in recent years, but list-initializing these class members has been classically faster.
tiledb/sm/misc/comparators.h
Outdated
* Constructor. | ||
* | ||
* @param domain The array domain. | ||
* @param buff The coordinate buffers, one per dimension, containing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename buff
to coord_buffs
to match the declaration.
*/ | ||
GlobalCmp2(const Domain* domain, const std::vector<const void*>& coord_buffs) | ||
: domain_(domain) | ||
, coord_buffs_(coord_buffs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unsafe because coord_buffs
is a temporary bound to a class member reference (coord_buffs_
). The temporary coord_buffs
is only guaranteed to persist for the lifetime of this constructor. In other words: coord_buffs_
will NOT extend the lifetime of coord_buffs
. When coord_buffs_
is dereferenced outside of the constructor, for example: within GlobalCmp2::operator()
, it may be invalid.
From the C++03 standard:
The second context is when a reference is bound to a temporary. The temporary to which the reference is bound or the temporary that is the complete object to a subobject of which the temporary is bound persists for the lifetime of the reference except as specified below. A temporary bound to a reference member in a constructor’s ctor-initializer (§12.6.2 [class.base.init]) persists until the constructor exits.
So we must ensure that the original coord_buffs
outlives this GlobalCmp2
.
I think the right thing to do here is to change coord_buffs_
from a const-ref to a const raw ptr. The contract is the same, but if the object is freed before we access it, we'll get a segfault instead of a garbage object.
Or, we could copy/move it, but I don't recommend that.
* - +1 if the first coordinates succeed the second on the cell order | ||
*/ | ||
template <class T> | ||
static int cell_order_cmp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation for making this static? Each cpp file that includes this header will get its own text copy of this routine. We should make this non-static to avoid binary bloat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to be able to store it in cell_order_cmp_func_
, this function needs to be static, otherwise the compiler complains.
* - +1 if the first coordinates succeed the second on the tile order | ||
*/ | ||
template <class T> | ||
static int tile_order_cmp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the above.
* - d: The dimension index to compare on. | ||
*/ | ||
std::vector<int (*)( | ||
const Dimension* dim, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about typedef
or using =
either the vector its element template? Just an easy way to re-use it with the tile_order_cmp_func_
declaration and any other place we may need to reuse it.
@@ -880,6 +1037,29 @@ class Writer { | |||
const std::string& attribute, | |||
FragmentMetadata* frag_meta, | |||
const std::vector<Tile>& tiles) const; | |||
|
|||
// TODO: remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is intended to be checked in, could you expand this comment to explain why it should be removed in a future patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be removed in my next PR over the next couple of days, but I will add a comment.
534050a
to
ef02c45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, just a few nitpick comments that you can address or not. Thanks!
tiledb/sm/query/writer.cc
Outdated
@@ -438,15 +446,16 @@ Status Writer::check_buffer_sizes() const { | |||
|
|||
auto cell_num = array_schema_->domain()->cell_num(subarray_); | |||
uint64_t expected_cell_num = 0; | |||
for (const auto& attr : attributes_) { | |||
for (const auto& it : attr_buffers_) { | |||
auto attr = it.first; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const-ref here to avoid unnecessary string copy
} | ||
coord_buffer_sizes_.clear(); | ||
coord_buffers_.clear(); | ||
coord_buffers_alloced_ = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is coord_buffers_alloced_
necessary? Couldn't we determine if we allocated the buffers just by checking if coord_buffers_
is or is not empty?
Does this instead indicate ownership of the buffers? Is there a scenario where coord_buffers_
is non-empty but this path wouldn't be responsible for freeing the buffers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is. In the very next PR, the user will be able to set the coordinate buffers separately for each dimension (whereas currently, they must zip all coordinates in a single buffer). We need to have a way to differentiate, which is accomplished by checking that member.
tiledb/sm/query/writer.cc
Outdated
case Datatype::UINT16: | ||
return compute_coords_metadata<uint16_t>(tiles, meta); | ||
case Datatype::INT32: | ||
return compute_coords_metadata<int>(tiles, meta); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extreme nitpick: let's use int32_t
here instead of int
. same for unsigned
->uint32
below.
tiledb/sm/query/writer.cc
Outdated
for (const auto& it : attr_buffers_) { | ||
auto attr = it.first; | ||
if (global_write_state_->attr_cells_written_[attr] != cell_num) { | ||
std::cout << global_write_state_->attr_cells_written_[attr] << "\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a debug statement? we should remove this
tiledb/sm/query/writer.cc
Outdated
std::string ret; | ||
auto dim_num = array_schema_->dim_num(); | ||
|
||
ret = "("; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this will be called frequently, consider a stringstream
here to avoid a bunch of string copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it isn't, but I will change anyway.
…on coordinates are split in different buffers and handled separately.
ef02c45
to
cada687
Compare
Towards addressing #93. Split writer algorithms such that the dimension coordinates are split in different buffers and handled separately.