Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement logical comparison for run encoded array #3747

Open
askoa opened this issue Feb 22, 2023 · 1 comment
Open

Implement logical comparison for run encoded array #3747

askoa opened this issue Feb 22, 2023 · 1 comment
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@askoa
Copy link
Contributor

askoa commented Feb 22, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Part of #3520

The equals method for run encoded array implemented in the PR #3662 is incomplete as it compares only the underlying physical arrays.

Describe the solution you'd like
Implement a logical comparison for run encoded arrays. Update the below run_equal method to do a full logical comparison.

/// The current implementation of comparison of run array support physical comparison.
/// Comparing run encoded array based on logical indices (`lhs_start`, `rhs_start`) will
/// be time consuming as converting from logical index to physical index cannot be done
/// in constant time. The current comparison compares the underlying physical arrays.
pub(super) fn run_equal(
lhs: &ArrayData,
rhs: &ArrayData,
lhs_start: usize,
rhs_start: usize,
len: usize,
) -> bool {

Describe alternatives you've considered
We cannot do it.

Additional context
Implementing a full logical comparison in arrow-data crate would be somewhat challenging as the crate does not depend on arrow-array. The arrow-array crate has functions that are useful to parse run_ends in run encoded array. Either arrow-array has to be added as dependency for arrow-data or some of the code in arrow-array has to be duplicated in arrow-data.

@askoa askoa added the enhancement Any new improvement worthy of a entry in the changelog label Feb 22, 2023
@tustvold
Copy link
Contributor

I suspect #1799 may provide a solution to this, as it will push the "array" logic lower into arrow-data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

2 participants