Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-17297: [Java][Doc] Adding documentation to interact between C++ to Java via C Data Interface #13788

Merged
merged 8 commits into from
Aug 5, 2022
260 changes: 259 additions & 1 deletion docs/source/java/cdata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,10 @@ Python communication using the C Data Interface.
Java to C++
-----------

Example: Share an Int64 array from C++ to Java:
Share an Int64 array from C++ to Java
=====================================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the wrong level of header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated


Example: ``Share an Int64 array from C++ to Java``:
davisusanibar marked this conversation as resolved.
Show resolved Hide resolved

**C++ Side**

Expand Down Expand Up @@ -220,4 +223,259 @@ Let's create a Java class to test our bridge:

C++-allocated array: [1, 2, 3, null, 5, 6, 7, 8, 9, 10]

Share an Int32 array from Java to C++
=====================================

Example: ``Share an Int32 array from Java to C++``:
davisusanibar marked this conversation as resolved.
Show resolved Hide resolved

**Java Side**

For this example, we will export a Java jar with all dependencies needed to
be readable by C++.
davisusanibar marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>cpptojava</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
<arrow.version>8.0.0</arrow.version>
</properties>
<repositories>
<repository>
<id>arrow-nightly</id>
<url>https://nightlies.apache.org/arrow/java</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-c-data</artifactId>
<version>${arrow.version}</version>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-memory-netty</artifactId>
<version>${arrow.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
<configuration>
<archive>
<manifest>
<mainClass>
ToBeCalledByCpp
</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>

.. code-block:: java

import org.apache.arrow.c.ArrowArray;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but everything is indented one space too many here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's actually true of all the code blocks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

import org.apache.arrow.c.ArrowSchema;
import org.apache.arrow.c.Data;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.FieldVector;
import org.apache.arrow.vector.IntVector;
import org.apache.arrow.vector.VectorSchemaRoot;

import java.util.Arrays;

public class ToBeCalledByCpp {
final static BufferAllocator allocator = new RootAllocator();

public static void fillfieldvectorfromjavatocpp(long schema_id, long array_id){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we follow Java naming conventions? schemaAddress

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

try (ArrowArray arrow_array = ArrowArray.wrap(array_id);
ArrowSchema arrow_schema = ArrowSchema.wrap(schema_id) ) {
Data.exportVector(allocator, populateFieldVectorToExport(), null, arrow_array, arrow_schema);
}
}

public static FieldVector populateFieldVectorToExport(){
IntVector intVector = new IntVector("int-to-export", allocator);
intVector.allocateNew(3);
intVector.setSafe(0, 1);
intVector.setSafe(1, 2);
intVector.setSafe(2, 3);
intVector.setValueCount(3);
System.out.println("[Java - side]: FieldVector: \n" + intVector);
davisusanibar marked this conversation as resolved.
Show resolved Hide resolved
return intVector;
}

public static void fillvectorschemarootfromjavatocpp(long schema_id, long array_id){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name these appropriately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

try (ArrowArray arrow_array = ArrowArray.wrap(array_id);
ArrowSchema arrow_schema = ArrowSchema.wrap(schema_id) ) {
Data.exportVectorSchemaRoot(allocator, populateVectorSchemaRootToExport(), null, arrow_array, arrow_schema);
}
}

public static VectorSchemaRoot populateVectorSchemaRootToExport(){
IntVector intVector = new IntVector("age-to-export", allocator);
intVector.setSafe(0, 10);
intVector.setSafe(1, 20);
intVector.setSafe(2, 30);
VectorSchemaRoot root = new VectorSchemaRoot(Arrays.asList(intVector));
root.setRowCount(3);
System.out.println("[Java - side] VectorSchemaRoot: \n" + root.contentToTSVString());
return root;
}

public static void main(String[] args) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need a main method, do we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted

populateFieldVectorToExport();
populateVectorSchemaRootToExport();
}
}

Compile our Java code to generate our jar with all dependencies needed for.
davisusanibar marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: shell

$ mvn clean install
$ cp target/cpptojava-1.0-SNAPSHOT-jar-with-dependencies.jar cpptojava.jar
$ cp cpptojava.jar <c++_project_path>
davisusanibar marked this conversation as resolved.
Show resolved Hide resolved

**C++ Side**

This process fetch Java JVM references to call methods needed for, via the C Data Interface:
davisusanibar marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: cpp

#include <iostream>
#include <arrow/api.h>
#include <arrow/c/bridge.h>
#include <jni.h>

JNIEnv* create_vm(JavaVM ** jvm) {
davisusanibar marked this conversation as resolved.
Show resolved Hide resolved
JNIEnv *env;
JavaVMInitArgs vm_args;
JavaVMOption options[2];
options[0].optionString = "-Djava.class.path=cpptojava.jar"; // java jar name
options[1].optionString = "-DXcheck:jni:pedantic";
vm_args.version = JNI_VERSION_1_8;
vm_args.nOptions = 2;
vm_args.options = options;
int status = JNI_CreateJavaVM(jvm, (void**) &env, &vm_args);
if (status < 0) printf("\n<<<<< Unable to Launch JVM >>>>>\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::abort instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

return env;
}

int main() {
JNIEnv* env;
JavaVM* jvm;
env = create_vm(&jvm);
if (env == NULL) return 1;
davisusanibar marked this conversation as resolved.
Show resolved Hide resolved
jclass javaClassToBeCalledByCpp = NULL;
davisusanibar marked this conversation as resolved.
Show resolved Hide resolved
javaClassToBeCalledByCpp = env ->FindClass("ToBeCalledByCpp");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make sure to format the source code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reformat code added

if ( javaClassToBeCalledByCpp != NULL ) {
jmethodID fillfieldvectorfromjavatocpp = NULL;
fillfieldvectorfromjavatocpp = env->GetStaticMethodID(javaClassToBeCalledByCpp, "fillfieldvectorfromjavatocpp", "(JJ)V");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorter names will make it easier to read once rendered in Sphinx

if ( fillfieldvectorfromjavatocpp != NULL ){
struct ArrowSchema arrowSchema;
struct ArrowArray arrowArray;
printf("\n<<<<< C++ to Java for Arrays >>>>>\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use std::cout in C++, not printf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

env->CallStaticVoidMethod(javaClassToBeCalledByCpp, fillfieldvectorfromjavatocpp, reinterpret_cast<uintptr_t>(&arrowSchema), reinterpret_cast<uintptr_t>(&arrowArray));
auto resultImportArray = arrow::ImportArray(&arrowArray, &arrowSchema);
std::shared_ptr<arrow::Array> array = resultImportArray.ValueOrDie();
lidavidm marked this conversation as resolved.
Show resolved Hide resolved
auto int32_array = std::static_pointer_cast<arrow::Int32Array>(array);
const int32_t* data = int32_array->raw_values();
for (int j = 0; j < int32_array->length(); j++){
printf("[C++ - side]: Data ImportArray - array[%d] = %d\n", j, data[j]);
}
} else {
printf("Problem to read method fillfieldvectorfromjavatocpp\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::abort or return EXIT_FAILURE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

}
jmethodID fillvectorschemarootfromjavatocpp = NULL;
fillvectorschemarootfromjavatocpp = env->GetStaticMethodID(javaClassToBeCalledByCpp, "fillvectorschemarootfromjavatocpp", "(JJ)V");
if ( fillvectorschemarootfromjavatocpp != NULL ){
struct ArrowSchema arrowSchema;
struct ArrowArray arrowArray;
printf("\n<<<<< C++ to Java for RecordBatch >>>>>\n");
env->CallStaticVoidMethod(javaClassToBeCalledByCpp, fillvectorschemarootfromjavatocpp, static_cast<long>(reinterpret_cast<uintptr_t>(&arrowSchema)), static_cast<long>(reinterpret_cast<uintptr_t>(&arrowArray)));
auto resultImportVectorSchemaRoot = arrow::ImportRecordBatch(&arrowArray, &arrowSchema);
std::shared_ptr<arrow::RecordBatch> recordBatch = resultImportVectorSchemaRoot.ValueOrDie();
lidavidm marked this conversation as resolved.
Show resolved Hide resolved
for (std::shared_ptr<arrow::Array> array : recordBatch->columns()) {
auto int32_array = std::static_pointer_cast<arrow::Int32Array>(array);
const int32_t* data = int32_array->raw_values();
for (int j = 0; j < int32_array->length(); j++){
printf("[C++ - side]: Data ImportArray - array[%d] = %d\n", j, data[j]);
}
}
} else {
printf("Problem to read method fillvectorschemarootfromjavatocpp\n");
}
} else {
printf("Problem to read class ToBeCalledByCpp\n");
}
jvm ->DestroyJavaVM();
return 0;
davisusanibar marked this conversation as resolved.
Show resolved Hide resolved
}

CMakeLists.txt definition file:

.. code-block:: xml
davisusanibar marked this conversation as resolved.
Show resolved Hide resolved

cmake_minimum_required(VERSION 3.19)
project(firstarrowcpp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name the project something related?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

find_package(JNI REQUIRED)
find_package(Arrow REQUIRED)
message(${ARROW_VERSION})
davisusanibar marked this conversation as resolved.
Show resolved Hide resolved
message(${ARROW_FULL_SO_VERSION})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be updated like the line above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Message not needed, deleted

include_directories(${JNI_INCLUDE_DIRS}) #'jni.h' file not found
davisusanibar marked this conversation as resolved.
Show resolved Hide resolved
set(CMAKE_CXX_STANDARD 14)
davisusanibar marked this conversation as resolved.
Show resolved Hide resolved
add_executable(${PROJECT_NAME} main.cpp)
target_link_libraries(firstarrowcpp PRIVATE arrow_shared)
target_link_libraries(firstarrowcpp PRIVATE ${JNI_LIBRARIES})

**C++ Test**
davisusanibar marked this conversation as resolved.
Show resolved Hide resolved

This is the result of the test:

davisusanibar marked this conversation as resolved.
Show resolved Hide resolved
.. code-block:: shell
davisusanibar marked this conversation as resolved.
Show resolved Hide resolved

<<<<< C++ to Java for Arrays >>>>>
[Java - side]: FieldVector:
[1, 2, 3]
[C++ - side]: Data ImportArray - array[0] = 1
[C++ - side]: Data ImportArray - array[1] = 2
[C++ - side]: Data ImportArray - array[2] = 3

<<<<< C++ to Java for RecordBatch >>>>>
[Java - side] VectorSchemaRoot:
age-to-export
10
20
30

[C++ - side]: Data ImportArray - array[0] = 10
[C++ - side]: Data ImportArray - array[1] = 20
[C++ - side]: Data ImportArray - array[2] = 30

.. _`JavaCPP`: https://github.com/bytedeco/javacpp