-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load enumerations for all array schemas in a single request. #5349
Conversation
227549e
to
5285bb7
Compare
} | ||
} | ||
|
||
return ret; | ||
} | ||
|
||
std::vector<shared_ptr<const Enumeration>> Array::get_enumerations( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted changes to this function from #5237.
get_enumerations_all_schemas() { | ||
return array_->get_enumerations_all_schemas(); | ||
} | ||
|
||
std::vector<shared_ptr<const tiledb::sm::Enumeration>> get_enumerations( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted changes to this function from #5237.
@@ -663,6 +663,13 @@ capi_return_t tiledb_array_load_all_enumerations(const tiledb_array_t* array) { | |||
return TILEDB_OK; | |||
} | |||
|
|||
capi_return_t tiledb_array_load_enumerations_all_schemas( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this API instead of changing parameters of tiledb_array_load_all_enumerations allows this PR to fix the regression without changes to REST or other repositories. We will have a new binding to wrap but that's just for completeness, this will only be used by the client to submit the request to REST. Had we changed the parameters like I did initially REST would fail to link against this branch until we updated the modified binding for the APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, we can't really change a C API after we have released, can we? Or that's some exception for experimentals? What we can do if it's incorrect though is mark it for deprecation. But rather than incorrect, is tiledb_array_load_all_enumerations
just fetching the enumerations for the latest schema only?
Then IMO it's good choice to add another API anyway since it's serving a different purpose and has different performance impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC for experimental APIs we can change them since they are experimental, but it's a breaking change and I was trying to avoid doing that. Since tiledb_array_load_all_enumerations
only loads for the latest schema I thought a new API would not be out of the question.
Yeah I think the name tiledb_array_load_all_enumerations
is a little misleading as you probably noticed but I updated those docs to clarify and if we want to rename or restructure those methods I think we could do that separately and go through a deprecation cycle.
tiledb/sm/array/array.cc
Outdated
|
||
// For now, reopen the array if it's found to be using array open v1. | ||
// Once we update to use post_array_schema_from_rest we will always have | ||
// array_schemas_all initialized and this could go away. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to point these comments out. As I mentioned in the comment I could serialize allArraySchemas with the enumerations intact and that would be fine, but that would only be required until post_array_schema_from_rest is finished. Array open v3 is now the default so I thought this was better than locking us into serializing full schemas here.
If we want to just serialize all the schemas instead I can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not do array operations under the hood. I have a more radical idea. How about returning a message at the C API level that this operation is unsupported for array v1 opened arrays and redirect the user to use v2 instead?
@ihnorton Are we ok to introduce new C APIs assuming array v2/query v3 now that it's default, and just be backwards compatible forever for the pre-existing ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I added exceptions in both the C API and in tiledb::sm::Array::load_all_enumerations
in case it is called internally like we do in unit-enumerations.cc
. I modified the existing tests to check for both exceptions if we are not using array open v2. The exception could be removed and tests updated as part of #5181.
Just marked as a draft until 59c529a is reverted, this is ready for review. There is a passing run with Query / Array open v2 here: https://gitlab.com/TileDB-Inc/TileDB-Internal/-/jobs/8093304584 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a few comments and we're good. Thanks for that complex fix!
throw ArrayException("Unable to load enumerations; Array is not open."); | ||
} | ||
|
||
std::unordered_map<std::string, std::vector<shared_ptr<const Enumeration>>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: a general comment, those complex types are easier to read if modeled as structs so that we avoid .second
type of names that are hard to remember what they refer to. No need to change it here, just stating my preference :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! I did not change it here but I'll keep it in mind for the future.. I have been confused by .first
and .second
in the past so I agree, a structure would have been better.
@@ -663,6 +663,13 @@ capi_return_t tiledb_array_load_all_enumerations(const tiledb_array_t* array) { | |||
return TILEDB_OK; | |||
} | |||
|
|||
capi_return_t tiledb_array_load_enumerations_all_schemas( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, we can't really change a C API after we have released, can we? Or that's some exception for experimentals? What we can do if it's incorrect though is mark it for deprecation. But rather than incorrect, is tiledb_array_load_all_enumerations
just fetching the enumerations for the latest schema only?
Then IMO it's good choice to add another API anyway since it's serving a different purpose and has different performance impact.
* @param ctx The context to use. | ||
* @param array The array to load enumerations for. | ||
*/ | ||
static void load_enumerations_all_schemas( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be experimental for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put this in experimental because the other enumerations APIs were still experimental and it felt weird to add a new API that was so similar outside of experimental
tiledb/sm/array/array.cc
Outdated
|
||
// For now, reopen the array if it's found to be using array open v1. | ||
// Once we update to use post_array_schema_from_rest we will always have | ||
// array_schemas_all initialized and this could go away. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not do array operations under the hood. I have a more radical idea. How about returning a message at the C API level that this operation is unsupported for array v1 opened arrays and redirect the user to use v2 instead?
@ihnorton Are we ok to introduce new C APIs assuming array v2/query v3 now that it's default, and just be backwards compatible forever for the pre-existing ones?
5dcc0cd
to
3e1787d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this!
+ Should fix TileDB-Py nightly failure.
Co-authored-by: Ypatia Tsavliri <[email protected]>
This reverts commit b0e2da4.
093c2d7
to
8ec2827
Compare
This fixes a performance regression added in #5291 while loading enumerations for all array schemas. The regression impacts arrays with multiple evolved schemas. Previously #5291 introduced a loop over REST requests for-each schema, this PR loads all enumerations for all schemas in a single request.
TYPE: C_API
DESC: Introduce tiledb_array_load_enumerations_all_schemas
TYPE: CPP_API
DESC: Introduce ArrayExperimental::load_enumerations_all_schemas