Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[c++] Column abstraction: SOMADimension, part 1 #3425

Open
wants to merge 16 commits into
base: xan/sc-59427/arrow-helpers
Choose a base branch
from

Conversation

XanthosXanthopoulos
Copy link
Collaborator

@XanthosXanthopoulos XanthosXanthopoulos commented Dec 12, 2024

SOMAColumn provides an abstraction over TileDB attributes and dimensions and exposes a common interface for all columns regardless of type. Subclasses of SOMAColumn can implement complex indexing mechanism through additional dimensions and encapsulate all that logic in one place and make it modular.

Subsequent PRs will add implementation for dimension and attributes.

Throughout this PR there is extensive use of std::any to enable polymorphism with the different SOMAColumn types while maintaining a templated interface at the abstract SOMAColumn.

Notes for Reviewer:
This PR introduces the abstract SOMAColumn class and the SOMADimension concrete class with basic unit tests.

@XanthosXanthopoulos XanthosXanthopoulos changed the title SOMADimension, part 1 [c++] Column abstraction: SOMADimension, part 1 Dec 12, 2024
Copy link
Member

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only partway through this PR. I'm still struggling with this (now) 1,875-line PR -- it's doing a lot of things. I'll need to pause and work on some other things for a while, and come back to this. (The other 3 PRs you split out that were smaller were nicely self-contained and self-descriptive and I was able to understand and review them earlier today.)

current_domain = std::any_cast<std::pair<std::string, std::string>>(
_core_current_domain_slot(ctx, array));

if (current_domain.first == "" && (current_domain.second == "\x7f" ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a comment here about why both \x7f and \xff -- it needs to be retained. This is very confusing without that comment. (Which is why I put the comment in.)

// Here is an intersection of a few oddities:
//
// * Core domain for string dims must be a nullptr pair; it cannot be
// anything else.
// * TileDB-Py shows this by using an empty-string pair, which we
// imitate.
// * Core current domain for string dims must _not_ be a nullptr pair.
// * In TileDB-SOMA, unless the user specifies otherwise, we use "" for
// min and "\x7f" for max. (We could use "\x7f" but that causes
// display problems in Python.)
//
// To work with all these factors, if the current domain is the default
// "" to "\7f", return an empty-string pair just as we do for domain.
// (There was some pre-1.15 software using "\xff" and it's super-cheap
// to check for that as well.)
if (arr[0] == "" && (arr[1] == "\x7f" || arr[1] == "\xff")) {
return std::pair<std::string, std::string>("", "");
} else {
return std::pair<std::string, std::string>(arr[0], arr[1]);
}

@jp-dark jp-dark self-requested a review December 23, 2024 16:10
Copy link
Collaborator

@jp-dark jp-dark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good for the most part, but it's so big it's hard to follow everything. Let's do a code walk through together next time we are both working.

I also added a couple small comments, and John's comment about the missing code comments still needs to be addressed.


using namespace tiledb;

class SOMADimension : public virtual SOMAColumn {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why virtual inheritance? As far as I can tell, nothing inherits from SOMADimension.

Comment on lines +52 to +53
#include "soma/soma_column.h"
#include "soma/soma_dimension.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you remove these in a future PR. Might as well just not add them in to begin with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[c++] Add an abstraction layer between SOMA columns and TileDB dimensions and attributes
3 participants