Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38015: [MATLAB] Add arrow.buffer.Buffer class to the MATLAB Interface #38020

Merged
merged 18 commits into from
Oct 10, 2023

Conversation

sgilmore10
Copy link
Member

@sgilmore10 sgilmore10 commented Oct 4, 2023

Rationale for this change

To unblock use cases that are not satisfied by the default Arrow -> MATLAB conversions (i.e. the toMATLAB() on arrow.array.Array), we would like expose the underlying Arrow data representation as a property on arrow.array.Array. One possible name for this property would be DataLayout, which would be an arrow.array.DataLayout object. Note, this class does not yet exist, so we would have to add it.

For example, the DataLayout property for temporal array types would return an object of the following class type:

classdef TemporalDataLayout < arrow.array.DataLayout
    properties
       Values % an arrow.array.Int32Array or an arrow.array.Int64Array
       Valid  % an arrow.buffer.Buffer 
    end
end

However, the Valid property on this class would need to be an arrow.buffer.Buffer object, which does not yet exist in the MATLAB interface. Therefore, it would be helpful to first add the arrow.buffer.Buffer class before adding the DataLayout property/class hierarchy. It's worth mentioning that adding arrow.buffer.Buffer will open up additional advanced use cases in the future.

What changes are included in this PR?

Added arrow.buffer.Buffer MATLAB class.

Properties of arrow.buffer.Buffer

  1. NumBytes - a scalar int64 value representing the size of the buffer in bytes.

Methods of arrow.buffer.Buffer

  1. toMATLAB - returns the data in the buffer as Nx1 uint8 vector, where N is the number of bytes.
  2. fromMATLAB(data) - Static method that creates an arrow.buffer.Buffer from a numeric array.

Example:

>> dataIn = [1 2];
>> buffer = arrow.buffer.Buffer.fromMATLAB(dataIn)

buffer = 

  Buffer with properties:

    NumBytes: 16

>> dataOut = toMATLAB(buffer)

dataOut =

  16×1 uint8 column vector

     0
     0
     0
     0
     0
     0
   240
    63
     0
     0
     0
     0
     0
     0
     0
    64

% Reinterpret bit pattern as a double array 
>> toDouble = typecast(dataOut, "double")

toDouble =

     1
     2

Are these changes tested?

Yes. Added a new test class called tBuffer.m

Are there any user-facing changes?

Yes. Users can now create arrow.buffer.Buffer objects via the fromMATLAB static method. However, there's not much users can do with this object as of now. We implemented this class to facilitate adding DataLayout property to arrow.array.Array, as described in the Rational for this change section.

@github-actions
Copy link

github-actions bot commented Oct 4, 2023

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@sgilmore10 sgilmore10 marked this pull request as ready for review October 4, 2023 19:46
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Oct 4, 2023
@sgilmore10 sgilmore10 changed the title [GH-38015]: [MATLAB] Add arrow.buffer.Buffer class to the MATLAB Interface GH-38015: [MATLAB] Add arrow.buffer.Buffer class to the MATLAB Interface Oct 5, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 5, 2023
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Oct 5, 2023
@github-actions github-actions bot removed awaiting changes Awaiting changes awaiting merge Awaiting merge labels Oct 6, 2023
@github-actions github-actions bot added the awaiting changes Awaiting changes label Oct 6, 2023
matlab/src/cpp/arrow/matlab/buffer/proxy/buffer.cc Outdated Show resolved Hide resolved
matlab/test/arrow/buffer/tBuffer.m Outdated Show resolved Hide resolved
matlab/test/arrow/buffer/tBuffer.m Outdated Show resolved Hide resolved
matlab/test/arrow/buffer/tBuffer.m Outdated Show resolved Hide resolved
matlab/test/arrow/buffer/tBuffer.m Outdated Show resolved Hide resolved
matlab/test/arrow/buffer/tBuffer.m Show resolved Hide resolved
matlab/test/arrow/buffer/tBuffer.m Outdated Show resolved Hide resolved
@kevingurney
Copy link
Member

Sorry, I had some pending comments last week - but I accidentally never published them.

@sgilmore10
Copy link
Member Author

Sorry, I had some pending comments last week - but I accidentally never published them.

No worries! I was out myself.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 10, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Oct 10, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 10, 2023
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Oct 10, 2023
@kevingurney
Copy link
Member

+1

@kevingurney kevingurney merged commit c37059a into apache:main Oct 10, 2023
10 checks passed
@kevingurney kevingurney deleted the GH-38015 branch October 10, 2023 14:29
@kevingurney kevingurney removed the awaiting merge Awaiting merge label Oct 10, 2023
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit c37059a.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request Oct 23, 2023
…B Interface (apache#38020)

### Rationale for this change

To unblock use cases that are not satisfied by the default Arrow -> MATLAB conversions (i.e. the `toMATLAB()` on `arrow.array.Array`), we would like expose the underlying Arrow data representation as a property on `arrow.array.Array`. One possible name for this property would be `DataLayout`, which would be an `arrow.array.DataLayout` object. Note, this class does not yet exist, so we would have to add it.

For example, the `DataLayout` property for temporal array types would return an object of the following class type: 

```matlab
classdef TemporalDataLayout < arrow.array.DataLayout
    properties
       Values % an arrow.array.Int32Array or an arrow.array.Int64Array
       Valid  % an arrow.buffer.Buffer 
    end
end
```

However, the `Valid` property on this class would need to be an `arrow.buffer.Buffer` object, which does not yet exist in the MATLAB interface.  Therefore, it would be helpful to first add the `arrow.buffer.Buffer` class before adding the `DataLayout` property/class hierarchy. It's worth mentioning that adding `arrow.buffer.Buffer` will open up additional advanced use cases in the future.

### What changes are included in this PR?

Added `arrow.buffer.Buffer` MATLAB class.

*Properties of `arrow.buffer.Buffer`*
 1. `NumBytes` - a  scalar `int64` value representing the size of the buffer in bytes. 

*Methods of `arrow.buffer.Buffer`*
1. `toMATLAB` - returns the data in the buffer as `Nx1` `uint8` vector, where `N` is the number of bytes.
2. `fromMATLAB(data)` - Static method that creates an `arrow.buffer.Buffer` from a numeric array. 

**Example:**
```matlab
>> dataIn = [1 2];
>> buffer = arrow.buffer.Buffer.fromMATLAB(dataIn)

buffer = 

  Buffer with properties:

    NumBytes: 16

>> dataOut = toMATLAB(buffer)

dataOut =

  16×1 uint8 column vector

     0
     0
     0
     0
     0
     0
   240
    63
     0
     0
     0
     0
     0
     0
     0
    64

% Reinterpret bit pattern as a double array 
>> toDouble = typecast(dataOut, "double")

toDouble =

     1
     2
```

### Are these changes tested?

Yes. Added a new test class called `tBuffer.m`

### Are there any user-facing changes?

Yes. Users can now create `arrow.buffer.Buffer` objects via the `fromMATLAB` static method. However, there's not much users can do with this object as of now. We implemented this class to facilitate adding `DataLayout` property to `arrow.array.Array`, as described in the **Rational for this change** section. 

* Closes: apache#38015

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…B Interface (apache#38020)

### Rationale for this change

To unblock use cases that are not satisfied by the default Arrow -> MATLAB conversions (i.e. the `toMATLAB()` on `arrow.array.Array`), we would like expose the underlying Arrow data representation as a property on `arrow.array.Array`. One possible name for this property would be `DataLayout`, which would be an `arrow.array.DataLayout` object. Note, this class does not yet exist, so we would have to add it.

For example, the `DataLayout` property for temporal array types would return an object of the following class type: 

```matlab
classdef TemporalDataLayout < arrow.array.DataLayout
    properties
       Values % an arrow.array.Int32Array or an arrow.array.Int64Array
       Valid  % an arrow.buffer.Buffer 
    end
end
```

However, the `Valid` property on this class would need to be an `arrow.buffer.Buffer` object, which does not yet exist in the MATLAB interface.  Therefore, it would be helpful to first add the `arrow.buffer.Buffer` class before adding the `DataLayout` property/class hierarchy. It's worth mentioning that adding `arrow.buffer.Buffer` will open up additional advanced use cases in the future.

### What changes are included in this PR?

Added `arrow.buffer.Buffer` MATLAB class.

*Properties of `arrow.buffer.Buffer`*
 1. `NumBytes` - a  scalar `int64` value representing the size of the buffer in bytes. 

*Methods of `arrow.buffer.Buffer`*
1. `toMATLAB` - returns the data in the buffer as `Nx1` `uint8` vector, where `N` is the number of bytes.
2. `fromMATLAB(data)` - Static method that creates an `arrow.buffer.Buffer` from a numeric array. 

**Example:**
```matlab
>> dataIn = [1 2];
>> buffer = arrow.buffer.Buffer.fromMATLAB(dataIn)

buffer = 

  Buffer with properties:

    NumBytes: 16

>> dataOut = toMATLAB(buffer)

dataOut =

  16×1 uint8 column vector

     0
     0
     0
     0
     0
     0
   240
    63
     0
     0
     0
     0
     0
     0
     0
    64

% Reinterpret bit pattern as a double array 
>> toDouble = typecast(dataOut, "double")

toDouble =

     1
     2
```

### Are these changes tested?

Yes. Added a new test class called `tBuffer.m`

### Are there any user-facing changes?

Yes. Users can now create `arrow.buffer.Buffer` objects via the `fromMATLAB` static method. However, there's not much users can do with this object as of now. We implemented this class to facilitate adding `DataLayout` property to `arrow.array.Array`, as described in the **Rational for this change** section. 

* Closes: apache#38015

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…B Interface (apache#38020)

### Rationale for this change

To unblock use cases that are not satisfied by the default Arrow -> MATLAB conversions (i.e. the `toMATLAB()` on `arrow.array.Array`), we would like expose the underlying Arrow data representation as a property on `arrow.array.Array`. One possible name for this property would be `DataLayout`, which would be an `arrow.array.DataLayout` object. Note, this class does not yet exist, so we would have to add it.

For example, the `DataLayout` property for temporal array types would return an object of the following class type: 

```matlab
classdef TemporalDataLayout < arrow.array.DataLayout
    properties
       Values % an arrow.array.Int32Array or an arrow.array.Int64Array
       Valid  % an arrow.buffer.Buffer 
    end
end
```

However, the `Valid` property on this class would need to be an `arrow.buffer.Buffer` object, which does not yet exist in the MATLAB interface.  Therefore, it would be helpful to first add the `arrow.buffer.Buffer` class before adding the `DataLayout` property/class hierarchy. It's worth mentioning that adding `arrow.buffer.Buffer` will open up additional advanced use cases in the future.

### What changes are included in this PR?

Added `arrow.buffer.Buffer` MATLAB class.

*Properties of `arrow.buffer.Buffer`*
 1. `NumBytes` - a  scalar `int64` value representing the size of the buffer in bytes. 

*Methods of `arrow.buffer.Buffer`*
1. `toMATLAB` - returns the data in the buffer as `Nx1` `uint8` vector, where `N` is the number of bytes.
2. `fromMATLAB(data)` - Static method that creates an `arrow.buffer.Buffer` from a numeric array. 

**Example:**
```matlab
>> dataIn = [1 2];
>> buffer = arrow.buffer.Buffer.fromMATLAB(dataIn)

buffer = 

  Buffer with properties:

    NumBytes: 16

>> dataOut = toMATLAB(buffer)

dataOut =

  16×1 uint8 column vector

     0
     0
     0
     0
     0
     0
   240
    63
     0
     0
     0
     0
     0
     0
     0
    64

% Reinterpret bit pattern as a double array 
>> toDouble = typecast(dataOut, "double")

toDouble =

     1
     2
```

### Are these changes tested?

Yes. Added a new test class called `tBuffer.m`

### Are there any user-facing changes?

Yes. Users can now create `arrow.buffer.Buffer` objects via the `fromMATLAB` static method. However, there's not much users can do with this object as of now. We implemented this class to facilitate adding `DataLayout` property to `arrow.array.Array`, as described in the **Rational for this change** section. 

* Closes: apache#38015

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[MATLAB] Add arrow.buffer.Buffer class to the MATLAB Interface
3 participants