-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove serialization non C.41 constructors from Subarray. #4685
base: main
Are you sure you want to change the base?
Conversation
This pull request has been linked to Shortcut Story #40000: Remove serialization non C.41 constructors from subarray.. |
37069b2
to
89f6aec
Compare
With this change in, PR #4685 and others to follow should: a) remove Subarray::set_stats and respective call from serialization/query.cc b) where Subarray is constructed in serialization/query.cc, change constructor call with the following code snippet: auto &stats_data = stats_from_capnp(reader.getStats()); Subarray subarray(array, layout, query_stats, stats_data, dummy_logger, true); c) The constructor calls parent_stats->create_child(prefix, stats_data); d) When all migrations are done, make Stats::populate_with_data private
89f6aec
to
13b5c1c
Compare
With this change in, PR #4685 and others to follow should: a) remove `Subarray::set_stats` and respective call from `serialization/query.cc` b) where `Subarray is constructed in `serialization/query.cc`, change constructor call with the following code snippet: ``` auto &stats_data = stats_from_capnp(reader.getStats()); Subarray subarray(array, layout, query_stats, stats_data, dummy_logger, true); ``` c) The constructor calls parent_stats->create_child(prefix, stats_data); d) When all migrations are done, make `Stats::populate_with_data private` --- TYPE: NO_HISTORY DESC: Remove serialization non C41 constructors from stats --------- Co-authored-by: KiterLuc <[email protected]>
13b5c1c
to
bbb8ac3
Compare
bbb8ac3
to
47dcb23
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some comments... I'll need to review this more in depth later.
tiledb/sm/serialization/query.cc
Outdated
Range range(data_ptr.begin(), data.size()); | ||
RETURN_NOT_OK(subarray->set_ranges_for_dim(i, {range})); | ||
subarray->set_is_default(i, range_reader.getHasDefaultRange()); | ||
throw_if_not_ok(range_subset[i].add_range_unrestricted(range)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not use add_range_unrestricted here and use a constructor for RangeSetAndSuperset. That way we can use emplace_back here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think it will be possible to make add_range_unsafe private for Subarray as part of this change. The test code that uses this API can be changed I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I updated the unit test calling add_range_unsafe
I noticed Subarray::add_range
using read_range_oob_error=false
on a dimension with a domain of [0, 3] would crop a range of [10, 20] to [10, 3]. Instead of using std::min/max in tiledb::type::crop_range
I updated to use std::clamp and now the range is cropped to [3, 3]. This is also on a separate commit if we want to revert.
tiledb/sm/serialization/query.cc
Outdated
} | ||
} | ||
|
||
std::vector<optional<Subarray::LabelRangeSubset>> label_range_subset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here for using reserve and emplace back if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the nullopts to fill in here for the dimensions that don't have label ranges set, or we segfault in checking for the optional values in checks like Subarray::has_label_ranges. I went ahead and did this but if you're not a fan of the logic I can revert to using the move assignment. I just had to make sure we fill in with nullopts for all dimension indices without label ranges.
RETURN_NOT_OK(query->set_subarray_unsafe(subarray)); | ||
Subarray subarray = subarray_from_capnp( | ||
subarray_reader, array, layout, query->stats(), dummy_logger); | ||
query->set_subarray(subarray); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self that I'll need to review this again very carefully.
47dcb23
to
a70f757
Compare
a70f757
to
f8da3e9
Compare
f8da3e9
to
86a4c1d
Compare
Removes Status from datatype_enum, returning `Datatype` instead. This also relocates `Subarray::LabelRangeSubset` to be public for use in #4685. --- TYPE: NO_HISTORY DESC: Remove Status from datatype_enum.
While working on #4685 we found it was possible for a cropped range to fall outside of the given domain range. This updates crop_range to call `std::clamp` instead of min/max to ensure the resulting cropped range is within the domain. --- TYPE: BUG DESC: Update crop_range to clamp to domain range.
Removes Status from datatype_enum, returning `Datatype` instead. This also relocates `Subarray::LabelRangeSubset` to be public for use in TileDB-Inc#4685. --- TYPE: NO_HISTORY DESC: Remove Status from datatype_enum.
While working on TileDB-Inc#4685 we found it was possible for a cropped range to fall outside of the given domain range. This updates crop_range to call `std::clamp` instead of min/max to ensure the resulting cropped range is within the domain. --- TYPE: BUG DESC: Update crop_range to clamp to domain range.
+ Unstatus subarray_from_capnp
c5fbf45
to
c9e4726
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The elimination of most of the *_unsafe
functions is greatly to be welcomed. add_range_unsafe
is still present, but we don't have a class for only a single dimension of a subarray as opposed to all of them at once.
The test changes seem unrelated to the C.41 constructor issue. Since they need work, it might be better to split them into a separate PR.
There's a fatal defect with accessing the array schema object. It should be correctable, but I haven't worked out exactly what would replace it. There are a couple of other, smaller defects, but this is the big one.
If this review process drags on, it should be straightforward to change the signature of subarray_from_capnp' only as a first PR and then change its implementation with a new
Subarray` constructor in a second PR. The signature changes are straightforward.
@@ -463,50 +463,119 @@ TEST_CASE( | |||
Array array_r(ctx, array_name, TILEDB_READ); | |||
Subarray subarray(ctx, array_r); | |||
|
|||
// If read_range_oob_error is false, the range will be cropped with a | |||
// warning and the query will succeed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is too short to be easily comprehensible. It requires knowing what the badly-named-from-the-beginning parameter read_range_oob_error
does, which is only changeable through a configuration variable. It's worth expanding on that here.
expected_data[0] = 1; | ||
expected_data[1] = 2; | ||
expected_data[4] = 3; | ||
expected_data[5] = 4; | ||
SECTION("- Upper bound OOB") { | ||
int range[] = {0, 100}; | ||
auto r = Range(&range[0], &range[1], sizeof(int)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a Range
constructor that allows more direct initialization from constants. It's relatively recent.
Range r{Tag<int>, 0, 100};
This constructor allows Range
objects to statically initialize member variables, as needed for tables of test vectors.
@@ -463,50 +463,119 @@ TEST_CASE( | |||
Array array_r(ctx, array_name, TILEDB_READ); | |||
Subarray subarray(ctx, array_r); | |||
|
|||
// If read_range_oob_error is false, the range will be cropped with a | |||
// warning and the query will succeed. | |||
auto read_range_oob_error = GENERATE(true, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structure of these out-of-bounds tests is overly complicated.
- They're buried inside a serialization test, but the
add_range
tests could and should be tested on a plainSubarray
object without using a query to do it. The test vectors (subarray definition, whether it's out of bounds, expected results, etc.) can be defined in tabular form outside the definition of any test case. - Common test vectors should run slightly differently. These distinctions follow the principle that (1) test setup code should throw and (2) tests themselves should use
CHECK
etc.- For testing
Subarray
, all the vectors get used.CHECK
should be used onadd_range
. - For testing queries, only the test vectors that don't fail
add_range
should be run. Failures ofadd_range
should throw.CHECK
should only be used for the query operations.
- For testing
- The OOB sections have a whole lot of copypasta. Using a loop over test vectors with
DYNAMIC_SECTION
would be much clearer. - When
add_range
converts away fromStatus
, it'll throw instead. It might be better to usethrow_if_not_ok
in these tests. The occurrences ofCHECK_FALSE
can be replaced withCHECK_THROWS
if testingclass Subarray
directly
if (expected == TILEDB_OK) { | ||
CHECK(query.query_status() == tiledb::Query::Status::COMPLETE); | ||
CHECK(a == std::vector<int>{1, 2, 3, 4}); | ||
// If the Subarray threw an exception when adding OOB ranges it will be unset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh? There's no exception handling in the code as written.
REQUIRE(tiledb::sm::serialization::subarray_from_capnp( | ||
builder, deserialized_subarray->subarray_) | ||
.ok()); | ||
*deserialized_subarray->subarray_ = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pointer should be check against nullptr
before dereferencing. It shouldn't be null, of course, but this is test code. If it's null, it's fine to throw a runtime error, since this is test setup code.
|
||
// Set default indicator | ||
subarray->set_is_default(i, range_reader.getHasDefaultRange()); | ||
Datatype type = datatype_enum(range_reader.getType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should be calling something like subarray_ranges_from_capnp
. It would be passed range_reader
and return an object suitable for initializing with the new C.41 construction, which in this case should be RangeSetAndSuperset
. While this isn't strictly necessary, it would make the code easier to read by delineating dependent lists etc. more clearly.
auto label_ranges_reader = reader.getLabelRanges(); | ||
uint32_t label_num = label_ranges_reader.size(); | ||
for (uint32_t i = 0; i < label_num; i++) { | ||
auto label_range_reader = label_ranges_reader[i]; | ||
auto dim_id = label_range_reader.getDimensionId(); | ||
auto dim_index = label_range_reader.getDimensionId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here. It would be better to have label_subarray_ranges_from_capnp
.
Subarray subarray_from_capnp( | ||
const capnp::Subarray::Reader& reader, | ||
const Array* array, | ||
Layout layout, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there was a preexisting defect that the layout was being written in subarray_to_capnp
but not read back in this function. It shouldn't be an external argument, certainly.
array, layout, query_stats, dummy_logger, true); | ||
Subarray& subarray_ = state->single_range_.back(); | ||
RETURN_NOT_OK(subarray_from_capnp(subarray_reader_, &subarray_)); | ||
auto subarray_reader = sr_reader[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subarray_reader
is already declared. This may generate a warning on some future compiler. Better just to give it it's own name. Admittedly subarray_reader_
was bad, since it looks like a member variable.
std::vector<Range>&& subset, | ||
bool coalesce_ranges) | ||
: impl_(range_subset_internals(datatype, superset, coalesce_ranges)) | ||
, is_implicitly_initialized_(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this line is correct. Implicit initialization is used IIRC to mean "the whole range", which is the case if the vector of subset ranges is empty.
This updates
subarray_from_capnp
to use a C.41 Subarray constructor.TYPE: NO_HISTORY
DESC: Remove serialization non C.41 constructors from Subarray.