Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-39680: [Java] enable half float support on Java module #39681

Merged
merged 11 commits into from
Jan 31, 2024

Conversation

davisusanibar
Copy link
Contributor

@davisusanibar davisusanibar commented Jan 18, 2024

Rationale for this change

  • To enable half float support on Java module.

What changes are included in this PR?

  • Add initial Float16 type support
  • Unit test
  • Integration test
  • Documentation

Are these changes tested?

Yes.

Are there any user-facing changes?

No

Copy link

⚠️ GitHub issue #39680 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 18, 2024
* where the value will be read from
* @return 4 byte float value
*/
public short getFloat16(long index) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that setFloat16 takes a float, by symmetry this method should probably return a float too.
@danepitkin WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting to float will involve a conversion to a 32 bit type. I think it would be nice to provide set/get operations on short as the core APIs and set/get operations on float as helper APIs. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, getShort and setShort already seems provided?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, good point.

I agree it makes sense to change this to float.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

// Positive infinity of type half-precision float.
private static final short POSITIVE_INFINITY = (short) 0x7c00;
// A Not-a-Number representation of a half-precision float.
private static final short NaN = (short) 0x7e00;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this and POSITIVE_INFINTY be public for app users to make use of?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@@ -39,7 +39,7 @@
@SuppressWarnings("unused")
public class UnionReader extends AbstractFieldReader {

private BaseReader[] readers = new BaseReader[45];
private BaseReader[] readers = new BaseReader[46];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good if this constant was given a name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@davisusanibar
Copy link
Contributor Author

@github-actions crossbow submit java-jars

Copy link

Only contributors can submit requests to this bot. Please ask someone from the community for help with getting the first commit in.
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/7650222562

@davisusanibar
Copy link
Contributor Author

In order to test Float16 data types in Arrow Java Dataset module, apache/arrow-testing#99 is required.

@davisusanibar davisusanibar marked this pull request as ready for review January 25, 2024 05:38
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think I'd favor using short as the 'default' float16 type and having float be the 'special'. Else the 'default' case has ambiguity when going from float -> float16.

Comment on lines 270 to 293
/*
The purpose of this method is to refresh the data in alltypes-java.parquet as required:
https://github.com/apache/arrow-testing/blob/master/data/parquet/alltypes-java.parquet
*/
public static void main(String[] args) throws Exception {
TestAllTypes test = new TestAllTypes();
try (BufferAllocator allocator = new RootAllocator();
VectorSchemaRoot root = test.generateAllTypesVector(allocator)) {
byte[] featherData = test.serializeFile(root);
try (SeekableByteChannel channel = new ByteArrayReadableSeekableByteChannel(featherData);
ArrowStreamReader reader = new ArrowStreamReader(channel, allocator)) {
TMP.create();
final File writtenFolder = TMP.newFolder();
final String writtenParquet = writtenFolder.toURI().toString();
DatasetFileWriter.write(allocator, reader, FileFormat.PARQUET,
writtenParquet);
Objects.requireNonNull(writtenFolder.listFiles());
Files.move(writtenFolder.listFiles()[0].toPath(),
Paths.get(writtenFolder.toPath().toString(), "alltypes-java.parquet"));
System.out.println("The file data/parquet/alltypes-java.parquet should be updated with this new data: " +
writtenFolder.listFiles()[0].toURI());
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you refactor this so that it doesn't duplicate the test itself? Also, technically this isn't really necessary; you can run the test in an IDE (and set a breakpoint so you can copy the temporary file)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will clean and delete this section.

<configuration>
<enableAssertions>false</enableAssertions>
<systemPropertyVariables>
<arrow.test.dataRoot>${project.basedir}/../../testing/data</arrow.test.dataRoot>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this actually used? I don't see any new tests referencing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The redundant part has already been added before, let me remove it

* where the value will be written
* @param value value to write
*/
public void setFloat16(long index, float value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, these methods should work via shorts and not floats, since there is no conversion ambiguity with shorts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@@ -332,6 +332,106 @@ public void testSizeOfValueBuffer() {
}
}

@Test /* Float2Vector */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted

) {
buf.setFloat16(0, +32.875f);
assertEquals((short) 0x501c, Float16.toFloat16(+32.875f));
assertEquals(Float16.toFloat((short) 0x501c), buf.getFloat16(0), 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should just use short in the first place.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jan 25, 2024
lidavidm pushed a commit to apache/arrow-testing that referenced this pull request Jan 25, 2024
…tAllTypes (#99)

To be able to test Float16 data types into Arrow Java Dataset module.

To fix dataset errors at: apache/arrow#39681
@lidavidm
Copy link
Member

I've merged the testing PR.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jan 26, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jan 29, 2024

/**
* Lifted from Apache Parquet MR project:
* https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/schema/Float16.java
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er, use the URL that contains the commit hash, not 'main'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jan 30, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jan 30, 2024
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Jan 30, 2024
@davisusanibar
Copy link
Contributor Author

Just for the record, I will work on these items in another PR:

  • Testing the C data interface with Float16 data
  • Add support for VectorVisitor classes (compare/validate)
  • Add support for Float16 to the arrow-algorithm module

If there are any other items that need to be reviewed, please let me know.

cc @lidavidm @danepitkin @vibhatha

@lidavidm
Copy link
Member

Do we have IPC integration tests for float16? I don't obviously see any

@lidavidm lidavidm merged commit 2a87693 into apache:main Jan 31, 2024
17 checks passed
@lidavidm lidavidm removed the awaiting merge Awaiting merge label Jan 31, 2024
@pitrou
Copy link
Member

pitrou commented Jan 31, 2024

Do we have IPC integration tests for float16? I don't obviously see any

If not, then perhaps it's worth opening a GH issue?

Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 2a87693.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 7 possible false positives for unstable benchmarks that are known to sometimes produce them.

@lidavidm
Copy link
Member

lidavidm commented Feb 1, 2024

Do we have IPC integration tests for float16? I don't obviously see any

@davisusanibar can you follow up here?

dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…he#39681)

### Rationale for this change

- To enable half float support on Java module.

### What changes are included in this PR?

- [x] Add initial Float16 type support
- [x] Unit test
- [x] Integration test
- [x] Documentation

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No
* Closes: apache#39680

Authored-by: david dali susanibar arce <[email protected]>
Signed-off-by: David Li <[email protected]>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Feb 28, 2024
…he#39681)

### Rationale for this change

- To enable half float support on Java module.

### What changes are included in this PR?

- [x] Add initial Float16 type support
- [x] Unit test
- [x] Integration test
- [x] Documentation

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No
* Closes: apache#39680

Authored-by: david dali susanibar arce <[email protected]>
Signed-off-by: David Li <[email protected]>
thisisnic pushed a commit to thisisnic/arrow that referenced this pull request Mar 8, 2024
…he#39681)

### Rationale for this change

- To enable half float support on Java module.

### What changes are included in this PR?

- [x] Add initial Float16 type support
- [x] Unit test
- [x] Integration test
- [x] Documentation

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No
* Closes: apache#39680

Authored-by: david dali susanibar arce <[email protected]>
Signed-off-by: David Li <[email protected]>
@vibhatha
Copy link
Collaborator

@github-actions crossbow submit -g java

Copy link

Revision: 19869ad

Submitted crossbow builds: ursacomputing/crossbow @ actions-f0517a91b4

Task Status
java-jars GitHub Actions
verify-rc-source-java-linux-almalinux-8-amd64 GitHub Actions
verify-rc-source-java-linux-conda-latest-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-20.04-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-22.04-amd64 GitHub Actions
verify-rc-source-java-macos-amd64 GitHub Actions

@vibhatha
Copy link
Collaborator

@github-actions crossbow submit java-jars

Copy link

Revision: 19869ad

Submitted crossbow builds: ursacomputing/crossbow @ actions-62925ab8ab

Task Status
java-jars GitHub Actions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Java] Add Float16 type support
6 participants