Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Composite Arrays #122

Merged
merged 15 commits into from
Mar 20, 2024
Merged

Composite Arrays #122

merged 15 commits into from
Mar 20, 2024

Conversation

gatesn
Copy link
Contributor

@gatesn gatesn commented Mar 19, 2024

Exploring an idea around composite arrays. Essentially this,

  • CompositeArray an encoding that holds a CompositeDType (to be renamed to ExtensionDType?), some metadata, and an underlying array.
  • A TypedCompositeArray is a generic implementation of the Array trait. This allows us to implement ArrayCompute specifically for the composite dtype, e.g. impl ArrayCompute for TypedCompositeArray<LocalDateTimeMetadata>
  • The regular CompositeArray handles a default compression impl, serde, stats, and anything that can be generically computed over the underlying array.
  • Typed composite arrays can be "flattened" into untyped composite arrays (this is the slightly janky bit?)

@gatesn gatesn requested a review from robert3005 March 19, 2024 19:23
@@ -14,7 +14,11 @@ impl EncodingCompression for CompositeEncoding {
array: &dyn Array,
_config: &CompressConfig,
) -> Option<&dyn EncodingCompression> {
(array.encoding().id() == &Self::ID).then_some(self)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe we could flatten in this compressor instead of doing it in compress.rs

@@ -53,6 +61,7 @@ impl FlattenFn for CompositeArray {

impl ScalarAtFn for CompositeArray {
fn scalar_at(&self, index: usize) -> VortexResult<Scalar> {
// TODO(ngates): this seems wrong...
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think scalars make it obvious that every dtype should have a canonical physical representation. Albeit where some dtype parameters are pluggable, e.g. int width?

@gatesn gatesn marked this pull request as ready for review March 20, 2024 09:50
@gatesn gatesn changed the title [wip] Composite Arrays Composite Arrays Mar 20, 2024
@gatesn gatesn enabled auto-merge (squash) March 20, 2024 10:08
Copy link
Member

@robert3005 robert3005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this gets us to a reasonable place. I wonder how often outside of time types will we use it.

@gatesn gatesn merged commit 0a8ba7e into develop Mar 20, 2024
2 checks passed
@gatesn gatesn deleted the ngates/composite-arrays branch March 20, 2024 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants