Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Adding variadicBufferCounts to RecordBatch #41730

Closed
vibhatha opened this issue May 20, 2024 · 1 comment
Closed

[Java] Adding variadicBufferCounts to RecordBatch #41730

vibhatha opened this issue May 20, 2024 · 1 comment

Comments

@vibhatha
Copy link
Collaborator

vibhatha commented May 20, 2024

Describe the enhancement requested

The main integration effort including tests has been documented #41729.
As the first step introducing the variadicBufferCounts must be completed including some tests to evaluate it with view vectors.

Goals

  1. Adding variadicBufferCounts to ArrowRecordBatch Java interface.
  2. Updating the classes which uses the ArrowRecordBatch interface to include the variadicBufferCounts attribute. The interfaces are StructVectorLoader, StructVectorUnloader, VectorLoader, VectorUnloader and corresponding test cases.
  3. TypeLayout.getTypeBufferCount and TypeLayout.getTypeLayout must be also updated to include the FieldVector usage and use IPC metadata from ArrowRecordBatch to obtain the accurate number of buffers. This steps proposes a new API which considers the vector as an input to the said methods. Thus we are adding a depcreation warning to the older API. This is required as we have to support variable width views with the new specification.

Non Goals

  1. Note that the usage of TypeLayout function is in some other components like JsonFileReader and JsonFileWriter. So to get an accurate representation of buffers require concrete work on those two components. I propose to conduct that work separately.

  2. Also the ValidateVectorBufferVisitor needs to update to the updated new API, but it also requires some concrete work in updating corresponding test cases.

Component(s)

Java

@vibhatha vibhatha self-assigned this May 20, 2024
lidavidm pushed a commit that referenced this issue May 24, 2024
### Rationale for this change

This PR adds the `variadicBufferCounts` attribute to `ArrowRecordBatch` in Java module. Furthermore, it also updates the `TypeLayout` functions `getTypeBufferCount` and `getTypeLayout` functions along with the corresponding test cases. Previously these changes were listed as issues #40934, #40935 and #40931. These two tickets will also be closed by this PR. 

### What changes are included in this PR?

The introduced two functions to `TypeLayout` is deprecating the old API and adds a new API. In this PR we are updating a few modules to use the new API.  Corresponding tests for the changed functions have also been added. 

This also updates the usage of `ArrowRecordBatch` across other modules and `TypeLayout` usage across a few modules. Some modules were excluded as mentioned in the issues non-goals section to be completed in a follow up effort as the scope and required tasks remain at large.  These modules will still use the deprecated API for TypeLayouts, but documented in the code for updating to the new API in a follow up effort. 

### Closing Subtasks 

- [X] #40934
- [X] #40935
- [X] #40931

### Are these changes tested?

The changes are tested using existing tests and new tests

### Are there any user-facing changes?

Yes

**This PR includes breaking changes to public APIs.**
* GitHub Issue: #41730

Lead-authored-by: Vibhatha Lakmal Abeykoon <[email protected]>
Co-authored-by: Vibhatha Abeykoon <[email protected]>
Signed-off-by: David Li <[email protected]>
@lidavidm
Copy link
Member

Issue resolved by pull request 41732
#41732

@lidavidm lidavidm added this to the 17.0.0 milestone May 24, 2024
vibhatha added a commit to vibhatha/arrow that referenced this issue May 25, 2024
…ache#41732)

### Rationale for this change

This PR adds the `variadicBufferCounts` attribute to `ArrowRecordBatch` in Java module. Furthermore, it also updates the `TypeLayout` functions `getTypeBufferCount` and `getTypeLayout` functions along with the corresponding test cases. Previously these changes were listed as issues apache#40934, apache#40935 and apache#40931. These two tickets will also be closed by this PR. 

### What changes are included in this PR?

The introduced two functions to `TypeLayout` is deprecating the old API and adds a new API. In this PR we are updating a few modules to use the new API.  Corresponding tests for the changed functions have also been added. 

This also updates the usage of `ArrowRecordBatch` across other modules and `TypeLayout` usage across a few modules. Some modules were excluded as mentioned in the issues non-goals section to be completed in a follow up effort as the scope and required tasks remain at large.  These modules will still use the deprecated API for TypeLayouts, but documented in the code for updating to the new API in a follow up effort. 

### Closing Subtasks 

- [X] apache#40934
- [X] apache#40935
- [X] apache#40931

### Are these changes tested?

The changes are tested using existing tests and new tests

### Are there any user-facing changes?

Yes

**This PR includes breaking changes to public APIs.**
* GitHub Issue: apache#41730

Lead-authored-by: Vibhatha Lakmal Abeykoon <[email protected]>
Co-authored-by: Vibhatha Abeykoon <[email protected]>
Signed-off-by: David Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants