Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove serialization non C.41 constructors from Subarray. #4685

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

shaunrd0
Copy link
Contributor

@shaunrd0 shaunrd0 commented Jan 30, 2024

This updates subarray_from_capnp to use a C.41 Subarray constructor.


TYPE: NO_HISTORY
DESC: Remove serialization non C.41 constructors from Subarray.

Copy link

This pull request has been linked to Shortcut Story #40000: Remove serialization non C.41 constructors from subarray..

@shaunrd0 shaunrd0 force-pushed the smr/sc-40000/c41-subarray-deserialization branch from 37069b2 to 89f6aec Compare January 31, 2024 21:49
robertbindar added a commit that referenced this pull request Feb 8, 2024
With this change in, PR #4685 and others to follow should:

a) remove Subarray::set_stats and respective call from serialization/query.cc
b) where Subarray is constructed in serialization/query.cc, change constructor
call with the following code snippet:
 auto &stats_data = stats_from_capnp(reader.getStats());
 Subarray subarray(array, layout, query_stats, stats_data, dummy_logger, true);
c) The constructor calls parent_stats->create_child(prefix, stats_data);
d) When all migrations are done, make Stats::populate_with_data private
@shaunrd0 shaunrd0 force-pushed the smr/sc-40000/c41-subarray-deserialization branch from 89f6aec to 13b5c1c Compare February 9, 2024 20:03
KiterLuc added a commit that referenced this pull request Feb 12, 2024
With this change in, PR #4685 and others to follow should:

a) remove `Subarray::set_stats` and respective call from
`serialization/query.cc` b) where `Subarray is constructed in
`serialization/query.cc`, change constructor call with the following
code snippet:
 ```
 auto &stats_data = stats_from_capnp(reader.getStats());
Subarray subarray(array, layout, query_stats, stats_data, dummy_logger,
true);
```
c) The constructor calls parent_stats->create_child(prefix, stats_data); d) When all migrations are done, make `Stats::populate_with_data private`

---
TYPE: NO_HISTORY
DESC: Remove serialization non C41 constructors from stats

---------

Co-authored-by: KiterLuc <[email protected]>
@shaunrd0 shaunrd0 force-pushed the smr/sc-40000/c41-subarray-deserialization branch from 13b5c1c to bbb8ac3 Compare February 12, 2024 21:49
@shaunrd0 shaunrd0 marked this pull request as ready for review February 13, 2024 14:27
@KiterLuc KiterLuc force-pushed the smr/sc-40000/c41-subarray-deserialization branch from bbb8ac3 to 47dcb23 Compare February 15, 2024 07:24
Copy link
Contributor

@KiterLuc KiterLuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some comments... I'll need to review this more in depth later.

tiledb/sm/serialization/query.cc Outdated Show resolved Hide resolved
tiledb/sm/serialization/query.cc Outdated Show resolved Hide resolved
tiledb/sm/serialization/query.cc Outdated Show resolved Hide resolved
Range range(data_ptr.begin(), data.size());
RETURN_NOT_OK(subarray->set_ranges_for_dim(i, {range}));
subarray->set_is_default(i, range_reader.getHasDefaultRange());
throw_if_not_ok(range_subset[i].add_range_unrestricted(range));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not use add_range_unrestricted here and use a constructor for RangeSetAndSuperset. That way we can use emplace_back here too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think it will be possible to make add_range_unsafe private for Subarray as part of this change. The test code that uses this API can be changed I believe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I updated the unit test calling add_range_unsafe I noticed Subarray::add_range using read_range_oob_error=false on a dimension with a domain of [0, 3] would crop a range of [10, 20] to [10, 3]. Instead of using std::min/max in tiledb::type::crop_range I updated to use std::clamp and now the range is cropped to [3, 3]. This is also on a separate commit if we want to revert.

}
}

std::vector<optional<Subarray::LabelRangeSubset>> label_range_subset(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here for using reserve and emplace back if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the nullopts to fill in here for the dimensions that don't have label ranges set, or we segfault in checking for the optional values in checks like Subarray::has_label_ranges. I went ahead and did this but if you're not a fan of the logic I can revert to using the move assignment. I just had to make sure we fill in with nullopts for all dimension indices without label ranges.

RETURN_NOT_OK(query->set_subarray_unsafe(subarray));
Subarray subarray = subarray_from_capnp(
subarray_reader, array, layout, query->stats(), dummy_logger);
query->set_subarray(subarray);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self that I'll need to review this again very carefully.

tiledb/sm/subarray/range_subset.cc Outdated Show resolved Hide resolved
@shaunrd0 shaunrd0 force-pushed the smr/sc-40000/c41-subarray-deserialization branch from 47dcb23 to a70f757 Compare February 16, 2024 18:37
@shaunrd0 shaunrd0 requested a review from KiterLuc February 16, 2024 18:47
@shaunrd0 shaunrd0 force-pushed the smr/sc-40000/c41-subarray-deserialization branch from a70f757 to f8da3e9 Compare February 16, 2024 19:45
@shaunrd0 shaunrd0 force-pushed the smr/sc-40000/c41-subarray-deserialization branch from f8da3e9 to 86a4c1d Compare February 28, 2024 17:51
KiterLuc pushed a commit that referenced this pull request Mar 2, 2024
Removes Status from datatype_enum, returning `Datatype` instead. This
also relocates `Subarray::LabelRangeSubset` to be public for use in
#4685.

---
TYPE: NO_HISTORY
DESC: Remove Status from datatype_enum.
KiterLuc pushed a commit that referenced this pull request Mar 2, 2024
While working on #4685 we found it was possible for a cropped range to
fall outside of the given domain range. This updates crop_range to call
`std::clamp` instead of min/max to ensure the resulting cropped range is
within the domain.

---
TYPE: BUG
DESC: Update crop_range to clamp to domain range.
davisp pushed a commit to davisp/TileDB that referenced this pull request Mar 3, 2024
Removes Status from datatype_enum, returning `Datatype` instead. This
also relocates `Subarray::LabelRangeSubset` to be public for use in
TileDB-Inc#4685.

---
TYPE: NO_HISTORY
DESC: Remove Status from datatype_enum.
davisp pushed a commit to davisp/TileDB that referenced this pull request Mar 3, 2024
While working on TileDB-Inc#4685 we found it was possible for a cropped range to
fall outside of the given domain range. This updates crop_range to call
`std::clamp` instead of min/max to ensure the resulting cropped range is
within the domain.

---
TYPE: BUG
DESC: Update crop_range to clamp to domain range.
@shaunrd0 shaunrd0 force-pushed the smr/sc-40000/c41-subarray-deserialization branch from c5fbf45 to c9e4726 Compare March 4, 2024 14:57
Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The elimination of most of the *_unsafe functions is greatly to be welcomed. add_range_unsafe is still present, but we don't have a class for only a single dimension of a subarray as opposed to all of them at once.

The test changes seem unrelated to the C.41 constructor issue. Since they need work, it might be better to split them into a separate PR.

There's a fatal defect with accessing the array schema object. It should be correctable, but I haven't worked out exactly what would replace it. There are a couple of other, smaller defects, but this is the big one.

If this review process drags on, it should be straightforward to change the signature of subarray_from_capnp' only as a first PR and then change its implementation with a new Subarray` constructor in a second PR. The signature changes are straightforward.

@@ -463,50 +463,119 @@ TEST_CASE(
Array array_r(ctx, array_name, TILEDB_READ);
Subarray subarray(ctx, array_r);

// If read_range_oob_error is false, the range will be cropped with a
// warning and the query will succeed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is too short to be easily comprehensible. It requires knowing what the badly-named-from-the-beginning parameter read_range_oob_error does, which is only changeable through a configuration variable. It's worth expanding on that here.

expected_data[0] = 1;
expected_data[1] = 2;
expected_data[4] = 3;
expected_data[5] = 4;
SECTION("- Upper bound OOB") {
int range[] = {0, 100};
auto r = Range(&range[0], &range[1], sizeof(int));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a Range constructor that allows more direct initialization from constants. It's relatively recent.

  Range r{Tag<int>, 0, 100};

This constructor allows Range objects to statically initialize member variables, as needed for tables of test vectors.

@@ -463,50 +463,119 @@ TEST_CASE(
Array array_r(ctx, array_name, TILEDB_READ);
Subarray subarray(ctx, array_r);

// If read_range_oob_error is false, the range will be cropped with a
// warning and the query will succeed.
auto read_range_oob_error = GENERATE(true, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure of these out-of-bounds tests is overly complicated.

  • They're buried inside a serialization test, but the add_range tests could and should be tested on a plain Subarray object without using a query to do it. The test vectors (subarray definition, whether it's out of bounds, expected results, etc.) can be defined in tabular form outside the definition of any test case.
  • Common test vectors should run slightly differently. These distinctions follow the principle that (1) test setup code should throw and (2) tests themselves should use CHECK etc.
    • For testing Subarray, all the vectors get used. CHECK should be used on add_range.
    • For testing queries, only the test vectors that don't fail add_range should be run. Failures of add_range should throw. CHECK should only be used for the query operations.
  • The OOB sections have a whole lot of copypasta. Using a loop over test vectors with DYNAMIC_SECTION would be much clearer.
  • When add_range converts away from Status, it'll throw instead. It might be better to use throw_if_not_ok in these tests. The occurrences of CHECK_FALSE can be replaced with CHECK_THROWS if testing class Subarray directly

if (expected == TILEDB_OK) {
CHECK(query.query_status() == tiledb::Query::Status::COMPLETE);
CHECK(a == std::vector<int>{1, 2, 3, 4});
// If the Subarray threw an exception when adding OOB ranges it will be unset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh? There's no exception handling in the code as written.

REQUIRE(tiledb::sm::serialization::subarray_from_capnp(
builder, deserialized_subarray->subarray_)
.ok());
*deserialized_subarray->subarray_ =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pointer should be check against nullptr before dereferencing. It shouldn't be null, of course, but this is test code. If it's null, it's fine to throw a runtime error, since this is test setup code.


// Set default indicator
subarray->set_is_default(i, range_reader.getHasDefaultRange());
Datatype type = datatype_enum(range_reader.getType());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should be calling something like subarray_ranges_from_capnp. It would be passed range_reader and return an object suitable for initializing with the new C.41 construction, which in this case should be RangeSetAndSuperset. While this isn't strictly necessary, it would make the code easier to read by delineating dependent lists etc. more clearly.

auto label_ranges_reader = reader.getLabelRanges();
uint32_t label_num = label_ranges_reader.size();
for (uint32_t i = 0; i < label_num; i++) {
auto label_range_reader = label_ranges_reader[i];
auto dim_id = label_range_reader.getDimensionId();
auto dim_index = label_range_reader.getDimensionId();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here. It would be better to have label_subarray_ranges_from_capnp.

Subarray subarray_from_capnp(
const capnp::Subarray::Reader& reader,
const Array* array,
Layout layout,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there was a preexisting defect that the layout was being written in subarray_to_capnp but not read back in this function. It shouldn't be an external argument, certainly.

array, layout, query_stats, dummy_logger, true);
Subarray& subarray_ = state->single_range_.back();
RETURN_NOT_OK(subarray_from_capnp(subarray_reader_, &subarray_));
auto subarray_reader = sr_reader[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subarray_reader is already declared. This may generate a warning on some future compiler. Better just to give it it's own name. Admittedly subarray_reader_ was bad, since it looks like a member variable.

std::vector<Range>&& subset,
bool coalesce_ranges)
: impl_(range_subset_internals(datatype, superset, coalesce_ranges))
, is_implicitly_initialized_(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this line is correct. Implicit initialization is used IIRC to mean "the whole range", which is the case if the vector of subset ranges is empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants