Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41730: [Java] Adding variadicBufferCounts to RecordBatch #41732

Merged
merged 16 commits into from
May 24, 2024

Conversation

vibhatha
Copy link
Collaborator

@vibhatha vibhatha commented May 20, 2024

Rationale for this change

This PR adds the variadicBufferCounts attribute to ArrowRecordBatch in Java module. Furthermore, it also updates the TypeLayout functions getTypeBufferCount and getTypeLayout functions along with the corresponding test cases. Previously these changes were listed as issues #40934, #40935 and #40931. These two tickets will also be closed by this PR.

What changes are included in this PR?

The introduced two functions to TypeLayout is deprecating the old API and adds a new API. In this PR we are updating a few modules to use the new API. Corresponding tests for the changed functions have also been added.

This also updates the usage of ArrowRecordBatch across other modules and TypeLayout usage across a few modules. Some modules were excluded as mentioned in the issues non-goals section to be completed in a follow up effort as the scope and required tasks remain at large. These modules will still use the deprecated API for TypeLayouts, but documented in the code for updating to the new API in a follow up effort.

Closing Subtasks

Are these changes tested?

The changes are tested using existing tests and new tests

Are there any user-facing changes?

Yes

This PR includes breaking changes to public APIs.

@vibhatha vibhatha changed the title GH-41370: [Java] Adding variadicBufferCounts to RecordBatch GH-41730: [Java] Adding variadicBufferCounts to RecordBatch May 20, 2024
@vibhatha vibhatha marked this pull request as ready for review May 21, 2024 02:33
@vibhatha vibhatha requested a review from lidavidm as a code owner May 21, 2024 02:33
@vibhatha
Copy link
Collaborator Author

@lidavidm to fully evaluate the change we have done here, the documented components needs to be further tested as mentioned here #41729.

But I think we need to make more follow up PRs at least per module or better if we can go entity wise to make sure things are functional for StringView.

@vibhatha
Copy link
Collaborator Author

@github-actions crossbow submit -g java

Copy link

Revision: 5e91dab

Submitted crossbow builds: ursacomputing/crossbow @ actions-ffc3f5185b

Task Status
java-jars GitHub Actions
test-conda-python-3.10-spark-v3.5.0 GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.8-spark-v3.5.0 GitHub Actions
verify-rc-source-java-linux-almalinux-8-amd64 GitHub Actions
verify-rc-source-java-linux-conda-latest-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-20.04-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-22.04-amd64 GitHub Actions
verify-rc-source-java-macos-amd64 GitHub Actions

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels May 22, 2024
@vibhatha
Copy link
Collaborator Author

@github-actions crossbow submit -g java

Copy link

Revision: 67fc7ce

Submitted crossbow builds: ursacomputing/crossbow @ actions-312ec2ec22

Task Status
java-jars GitHub Actions
test-conda-python-3.10-spark-v3.5.0 GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.8-spark-v3.5.0 GitHub Actions
verify-rc-source-java-linux-almalinux-8-amd64 GitHub Actions
verify-rc-source-java-linux-conda-latest-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-20.04-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-22.04-amd64 GitHub Actions
verify-rc-source-java-macos-amd64 GitHub Actions

}

private void appendNodes(FieldVector vector, List<ArrowFieldNode> nodes, List<ArrowBuf> buffers) {
private long getVariadicBufferCount(FieldVector vector) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a slight issue in adding an interface for this. The VectorUnloader and StructVectorUnloader are in vector and c modules respectively. So we cannot have a common interface for both. So would it be fine if we add BaseVectorUnloader interface for both modules? I didn't add it yet.

@vibhatha
Copy link
Collaborator Author

@github-actions crossbow submit -g java

Copy link

Revision: eb2f473

Submitted crossbow builds: ursacomputing/crossbow @ actions-c98dc1f751

Task Status
java-jars GitHub Actions
test-conda-python-3.10-spark-v3.5.0 GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.8-spark-v3.5.0 GitHub Actions
verify-rc-source-java-linux-almalinux-8-amd64 GitHub Actions
verify-rc-source-java-linux-conda-latest-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-20.04-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-22.04-amd64 GitHub Actions
verify-rc-source-java-macos-amd64 GitHub Actions

@vibhatha vibhatha requested a review from lidavidm May 22, 2024 07:34
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we enable the integration tests now?

int length, List<ArrowFieldNode> nodes, List<ArrowBuf> buffers,
ArrowBodyCompression bodyCompression, List<Long> variadicBufferCounts, boolean alignBuffers,
boolean retainBuffers) {
super();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't the other constructors delegate here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated, is it fine?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels May 23, 2024
@vibhatha
Copy link
Collaborator Author

Can we enable the integration tests now?

If you mean C data interface things, not yet, still need some work.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 23, 2024
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels May 24, 2024
@lidavidm lidavidm merged commit 0c96be3 into apache:main May 24, 2024
17 checks passed
@lidavidm lidavidm removed the awaiting merge Awaiting merge label May 24, 2024
vibhatha added a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…ache#41732)

### Rationale for this change

This PR adds the `variadicBufferCounts` attribute to `ArrowRecordBatch` in Java module. Furthermore, it also updates the `TypeLayout` functions `getTypeBufferCount` and `getTypeLayout` functions along with the corresponding test cases. Previously these changes were listed as issues apache#40934, apache#40935 and apache#40931. These two tickets will also be closed by this PR. 

### What changes are included in this PR?

The introduced two functions to `TypeLayout` is deprecating the old API and adds a new API. In this PR we are updating a few modules to use the new API.  Corresponding tests for the changed functions have also been added. 

This also updates the usage of `ArrowRecordBatch` across other modules and `TypeLayout` usage across a few modules. Some modules were excluded as mentioned in the issues non-goals section to be completed in a follow up effort as the scope and required tasks remain at large.  These modules will still use the deprecated API for TypeLayouts, but documented in the code for updating to the new API in a follow up effort. 

### Closing Subtasks 

- [X] apache#40934
- [X] apache#40935
- [X] apache#40931

### Are these changes tested?

The changes are tested using existing tests and new tests

### Are there any user-facing changes?

Yes

**This PR includes breaking changes to public APIs.**
* GitHub Issue: apache#41730

Lead-authored-by: Vibhatha Lakmal Abeykoon <[email protected]>
Co-authored-by: Vibhatha Abeykoon <[email protected]>
Signed-off-by: David Li <[email protected]>
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 0c96be3.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 5 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants