Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-34690: [Go]: Add sort indices function #34719

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

hermanschaaf
Copy link
Contributor

@hermanschaaf hermanschaaf commented Mar 24, 2023

Just opening an early draft for now, still trying to figure things out.

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@hermanschaaf hermanschaaf changed the title Draft: sort indices GH-34690: [Go]: Add sort indices function Mar 24, 2023
@github-actions
Copy link

@github-actions
Copy link

⚠️ GitHub issue apache/arrow-go#66 has been automatically assigned in GitHub to PR creator.

Comment on lines +65 to +72
// Sort the indices slice based on the values in the input array
sort.Slice(indices, func(i, j int) bool {
// TODO: not sure what to do here?
// compare using scalar comparison?
a := inArr.Buffers[1].Buf[indices[i]*sz : (indices[i]+1)*sz]
b := inArr.Buffers[1].Buf[indices[j]*sz : (indices[j]+1)*sz]
return bytes.Compare(a, b) < 0
})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zeroshade So I've gotten this far, but the bytes.Compare comparison here was just to get something compiling. 😃 In reality we need to compare the scalar values represented by those bytes, I believe. (I've also only tried to make it work for int32 so far.)

I'm not sure how to compare scalar values inside the array with one another? Are there existing comparators that can be used her? And is there a way to access individual items in an array, other than how I'm doing it here?

Copy link
Member

@zeroshade zeroshade Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to have a look at https://github.com/apache/arrow/blob/main/go/arrow/compute/internal/kernels/scalar_comparisons.go (which implements the less than/equals/greater than/etc.) operations at a scalar level. I'll try to take another look at this early next week but you might be able to get an idea from over there on how to tackle this.

A tricky piece to think about though is that we'd want to be able to manage passing an option to allow sorting a record batch / table by multiple columns. In that scenario I think we can learn from https://arrow.apache.org/blog/2022/11/07/multi-column-sorts-in-arrow-rust-part-1/ and the corresponding part 2 and potentially leverage similar techniques to implement the sorting here.

expect []int64
valueType arrow.DataType
}{
{"simple int32", []string{`[1, 1, 0, -5, -5, -5, 255, 255]`}, []int64{3, 4, 5, 2, 0, 1, 6, 7}, arrow.PrimitiveTypes.Int32},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to get a single simple test case to pass for now

@thorfour
Copy link
Contributor

@hermanschaaf Is this issue still actively being worked on? We'd love to be able to use a sorting function but if it's not being pursued currently I would be interested in picking it up.

@hermanschaaf
Copy link
Contributor Author

@thorfour Yes, I'm not actively working on this right now. If you'd like to pick it up, please do!

Copy link

⚠️ GitHub issue #34690 has no components, please add labels for components.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Go][Table] Implement sort function
3 participants