-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
throw error on dimension drop in schema evolution. #4958
throw error on dimension drop in schema evolution. #4958
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new validation check happens too late. This code has it happening when the operation runs, not when it's first specified.
Doing this correctly means implementing validation checks in the two places that the attributes_to_drop_
can gain new members:
- In
drop_attribute
, which is connected to the C API. - In
attribute_names_to_drop
, which is used only by capnp serialization.
In addition, the validation check itself isn't quite right. See drop_attribute
for a check that's not made in the new code. Furthermore, the validations for incremental changes through the API are not identical to those made with serialization, where we can validate everything together more clearly and efficiently than making the same checks incrementally.
I would be satisfied if we only tightened up drop_attribute
in this PR. We can leave validation for capnp for later. That means at minimum three things:
- A story to add the requisite validation checks for capnp.
- Documentation of the invariants in the class members. The descriptions in the current docs for the relevant member variables are inadequate. Here are two of them; neither is currently documented explicitly in the code.
- Each element in
attributes_to_drop_
must be an attribute of the array schema attributes_to_drop_
and the attributes to add must be disjoint
- Each element in
- Validation in
drop_attribute
that the invariants are satisfied. This is where the validation check that this PR seeks to add belongs.
Note the ordering. Getting the code in place is lower priority than the others. If you have the others, the code will eventually get right, but not vice-versa.
auto evolution = ArraySchemaEvolution(ctx); | ||
|
||
// try to drop dimension d1 | ||
evolution.drop_attribute("d1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the statement that should fail. Avoiding a failure here only defers detection of the problem until later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments Eric. I agree that the check should happen further up in drop_attribute()
however the ArraySchema object (which is required for the use of has_attribute()
) is not available in drop_attribute()
. It is only passed as a parameter in ArraySchemaEvolution::array_evolve()
in the form of the array_uri
which is then used to create the schema object
Here is the current order:
Array::create(array_uri, schema);
auto evolution = ArraySchemaEvolution(ctx);
evolution.drop_attribute("d1");
evolution.array_evolve(array_uri));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ArraySchema object [...] is not available in drop_attribute()
Yes, I see that now. I didn't realize the whole design was broken in this particular way.
@@ -152,9 +152,12 @@ shared_ptr<ArraySchema> ArraySchemaEvolution::evolve_schema( | |||
for (auto& attr_name : attributes_to_drop_) { | |||
bool has_attr = false; | |||
throw_if_not_ok(schema->has_attribute(attr_name, &has_attr)); | |||
if (has_attr) { | |||
throw_if_not_ok(schema->drop_attribute(attr_name)); | |||
if (!has_attr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is not quite right. The issue is with capnp deserialization. As with all input that comes in from the outside, here from the network, we need to validate inputs. We may not assume that a capnp message was generated by libtiledb
necessarily; it might have been formed by other, likely malicious, software. We don't need to care why it was created, only that it's detected and rejected.
There's a check in drop_attribute
to see if the attribute is the name of an attribute that was previously added. In this case, there's no check to see if the dropped attribute here also appears in the list of added attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving for expediency.
This code does not correct the validation issues with capnp.
5d3b2b4
to
856329e
Compare
Adding an error message when trying to drop a non existent attribute or a dimension with the Schema Evolution API. + Test
[sc-30774]
TYPE: BUG
DESC: Throw error on dimension drop in schema evolution.