Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ignored limit on lexsort_to_indices #2991

Merged
merged 3 commits into from
Oct 31, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Oct 31, 2022

Which issue does this PR close?

Closes #2990

Rationale for this change

Regresssion was introduced in #2929 by https://github.com/apache/arrow-rs/pull/2929/files#r1005140128 and there was no test coverage 😭

What changes are included in this PR?

  1. Fix bug
  2. Add test coverage

Are there any user-facing changes?

Not really as we haven't released this code yet

cc @isidentical

@@ -950,7 +950,7 @@ pub fn lexsort_to_indices(
});

Ok(UInt32Array::from_iter_values(
value_indices.iter().map(|i| *i as u32),
value_indices.iter().take(len).map(|i| *i as u32),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the bugfix

@@ -3439,7 +3451,8 @@ mod tests {
Some(2),
Some(17),
])) as ArrayRef];
test_lex_sort_arrays(input.clone(), expected, None);
test_lex_sort_arrays(input.clone(), expected.clone(), None);
test_lex_sort_arrays(input.clone(), slice_arrays(expected, 0, 2), Some(2));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place on master that a limit is passed to lexsort_to_indices in the tests is immediately below here. However, very sadly, there is a special case code path for single arrays that doesn't hit the bug path

        let expected = vec![Arc::new(PrimitiveArray::<Int64Type>::from(vec![
            Some(-1),
            Some(0),
            Some(2),
        ])) as ArrayRef];
        test_lex_sort_arrays(input, expected, Some(3));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This addition is strictly unnecessary from a coverage perspective (it was already covered), but I wanted to make the test_lex_sort_arrays based tests all consistently patterned so it was easier to reason about coverage

@@ -3519,7 +3532,8 @@ mod tests {
Some(-2),
])) as ArrayRef,
];
test_lex_sort_arrays(input, expected, None);
test_lex_sort_arrays(input.clone(), expected.clone(), None);
test_lex_sort_arrays(input, slice_arrays(expected, 0, 2), Some(2));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails immediately without the fix -- the output is too big!)

Copy link
Contributor

@isidentical isidentical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a fast fix ❤️ Looks great to me!

@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 31, 2022
@alamb alamb merged commit 66c9636 into apache:master Oct 31, 2022
@alamb
Copy link
Contributor Author

alamb commented Oct 31, 2022

I plan to create 26.0.0 RC2 with this fix

@alamb alamb deleted the alamb/lexsort_to_indices_limit branch October 31, 2022 17:26
alamb added a commit to alamb/arrow-rs that referenced this pull request Oct 31, 2022
* Fix ignored limit on lexsort_to_indices

* Update comments

* Update arrow/src/compute/kernels/sort.rs

Co-authored-by: Batuhan Taskaya <[email protected]>

Co-authored-by: Batuhan Taskaya <[email protected]>
@ursabot
Copy link

ursabot commented Oct 31, 2022

Benchmark runs are scheduled for baseline = 40d61ec and contender = 66c9636. 66c9636 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression in 26.0.0 RC1: limit is not applied for multi-column sorts lexsort_to_indices
5 participants