-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-39680: [Java] enable half float support on Java module #39681
Conversation
|
java/memory/memory-core/src/main/java/org/apache/arrow/memory/util/Float16.java
Show resolved
Hide resolved
* where the value will be read from | ||
* @return 4 byte float value | ||
*/ | ||
public short getFloat16(long index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that setFloat16
takes a float, by symmetry this method should probably return a float too.
@danepitkin WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Casting to float will involve a conversion to a 32 bit type. I think it would be nice to provide set/get operations on short as the core APIs and set/get operations on float as helper APIs. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, getShort
and setShort
already seems provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh, good point.
I agree it makes sense to change this to float.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
// Positive infinity of type half-precision float. | ||
private static final short POSITIVE_INFINITY = (short) 0x7c00; | ||
// A Not-a-Number representation of a half-precision float. | ||
private static final short NaN = (short) 0x7e00; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this and POSITIVE_INFINTY be public for app users to make use of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
@@ -39,7 +39,7 @@ | |||
@SuppressWarnings("unused") | |||
public class UnionReader extends AbstractFieldReader { | |||
|
|||
private BaseReader[] readers = new BaseReader[45]; | |||
private BaseReader[] readers = new BaseReader[46]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good if this constant was given a name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
@github-actions crossbow submit java-jars |
|
In order to test Float16 data types in Arrow Java Dataset module, apache/arrow-testing#99 is required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think I'd favor using short as the 'default' float16 type and having float be the 'special'. Else the 'default' case has ambiguity when going from float -> float16.
/* | ||
The purpose of this method is to refresh the data in alltypes-java.parquet as required: | ||
https://github.com/apache/arrow-testing/blob/master/data/parquet/alltypes-java.parquet | ||
*/ | ||
public static void main(String[] args) throws Exception { | ||
TestAllTypes test = new TestAllTypes(); | ||
try (BufferAllocator allocator = new RootAllocator(); | ||
VectorSchemaRoot root = test.generateAllTypesVector(allocator)) { | ||
byte[] featherData = test.serializeFile(root); | ||
try (SeekableByteChannel channel = new ByteArrayReadableSeekableByteChannel(featherData); | ||
ArrowStreamReader reader = new ArrowStreamReader(channel, allocator)) { | ||
TMP.create(); | ||
final File writtenFolder = TMP.newFolder(); | ||
final String writtenParquet = writtenFolder.toURI().toString(); | ||
DatasetFileWriter.write(allocator, reader, FileFormat.PARQUET, | ||
writtenParquet); | ||
Objects.requireNonNull(writtenFolder.listFiles()); | ||
Files.move(writtenFolder.listFiles()[0].toPath(), | ||
Paths.get(writtenFolder.toPath().toString(), "alltypes-java.parquet")); | ||
System.out.println("The file data/parquet/alltypes-java.parquet should be updated with this new data: " + | ||
writtenFolder.listFiles()[0].toURI()); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you refactor this so that it doesn't duplicate the test itself? Also, technically this isn't really necessary; you can run the test in an IDE (and set a breakpoint so you can copy the temporary file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will clean and delete this section.
java/dataset/pom.xml
Outdated
<configuration> | ||
<enableAssertions>false</enableAssertions> | ||
<systemPropertyVariables> | ||
<arrow.test.dataRoot>${project.basedir}/../../testing/data</arrow.test.dataRoot> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this actually used? I don't see any new tests referencing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The redundant part has already been added before, let me remove it
java/memory/memory-core/src/main/java/org/apache/arrow/memory/util/Float16.java
Show resolved
Hide resolved
* where the value will be written | ||
* @param value value to write | ||
*/ | ||
public void setFloat16(long index, float value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, these methods should work via shorts and not floats, since there is no conversion ambiguity with shorts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
@@ -332,6 +332,106 @@ public void testSizeOfValueBuffer() { | |||
} | |||
} | |||
|
|||
@Test /* Float2Vector */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted
java/memory/memory-core/src/main/java/org/apache/arrow/memory/util/Float16.java
Show resolved
Hide resolved
) { | ||
buf.setFloat16(0, +32.875f); | ||
assertEquals((short) 0x501c, Float16.toFloat16(+32.875f)); | ||
assertEquals(Float16.toFloat((short) 0x501c), buf.getFloat16(0), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we should just use short in the first place.
…tAllTypes (#99) To be able to test Float16 data types into Arrow Java Dataset module. To fix dataset errors at: apache/arrow#39681
I've merged the testing PR. |
|
||
/** | ||
* Lifted from Apache Parquet MR project: | ||
* https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/schema/Float16.java |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Er, use the URL that contains the commit hash, not 'main'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
Just for the record, I will work on these items in another PR:
If there are any other items that need to be reviewed, please let me know. |
Do we have IPC integration tests for float16? I don't obviously see any |
If not, then perhaps it's worth opening a GH issue? |
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 2a87693. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 7 possible false positives for unstable benchmarks that are known to sometimes produce them. |
@davisusanibar can you follow up here? |
…he#39681) ### Rationale for this change - To enable half float support on Java module. ### What changes are included in this PR? - [x] Add initial Float16 type support - [x] Unit test - [x] Integration test - [x] Documentation ### Are these changes tested? Yes. ### Are there any user-facing changes? No * Closes: apache#39680 Authored-by: david dali susanibar arce <[email protected]> Signed-off-by: David Li <[email protected]>
…he#39681) ### Rationale for this change - To enable half float support on Java module. ### What changes are included in this PR? - [x] Add initial Float16 type support - [x] Unit test - [x] Integration test - [x] Documentation ### Are these changes tested? Yes. ### Are there any user-facing changes? No * Closes: apache#39680 Authored-by: david dali susanibar arce <[email protected]> Signed-off-by: David Li <[email protected]>
…he#39681) ### Rationale for this change - To enable half float support on Java module. ### What changes are included in this PR? - [x] Add initial Float16 type support - [x] Unit test - [x] Integration test - [x] Documentation ### Are these changes tested? Yes. ### Are there any user-facing changes? No * Closes: apache#39680 Authored-by: david dali susanibar arce <[email protected]> Signed-off-by: David Li <[email protected]>
@github-actions crossbow submit -g java |
Revision: 19869ad Submitted crossbow builds: ursacomputing/crossbow @ actions-f0517a91b4 |
@github-actions crossbow submit java-jars |
Revision: 19869ad Submitted crossbow builds: ursacomputing/crossbow @ actions-62925ab8ab
|
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Yes.
Are there any user-facing changes?
No