diff --git a/docs/source/developers/java/development.rst b/docs/source/developers/java/development.rst index 1094d02f1c140..ce7e1704f641c 100644 --- a/docs/source/developers/java/development.rst +++ b/docs/source/developers/java/development.rst @@ -84,11 +84,13 @@ UI Benchmark: Integration Testing =================== -Integration tests can be run via Archery: +Integration tests can be run :ref:`via Archery `. +For example, assuming you only built Arrow Java and want to run the IPC +integration tests, you would do: -.. code-block:: +.. code-block:: console - $ archery integration --with-java true --with-cpp false --with-js false --with-csharp false --with-go false --with-rust false + $ archery integration --run-ipc --with-java 1 Code Style ========== @@ -104,4 +106,4 @@ This checks the code style of all source code under the current directory or fro .. _benchmark: https://github.com/ursacomputing/benchmarks .. _archery: https://github.com/apache/arrow/blob/main/dev/conbench_envs/README.md#L188 .. _conbench: https://github.com/conbench/conbench -.. _checkstyle: https://github.com/apache/arrow/blob/main/java/dev/checkstyle/checkstyle.xml \ No newline at end of file +.. _checkstyle: https://github.com/apache/arrow/blob/main/java/dev/checkstyle/checkstyle.xml diff --git a/docs/source/format/Integration.rst b/docs/source/format/Integration.rst index 5f2341b9c469c..e1160b287e77c 100644 --- a/docs/source/format/Integration.rst +++ b/docs/source/format/Integration.rst @@ -20,32 +20,98 @@ Integration Testing =================== +To ensure Arrow implementations are interoperable between each other, +the Arrow project includes cross-language integration tests which are +regularly run as Continuous Integration tasks. + +The integration tests exercise compliance with several Arrow specifications: +the :ref:`IPC format `, the :ref:`Flight RPC ` protocol, +and the :ref:`C Data Interface `. + +Strategy +-------- + Our strategy for integration testing between Arrow implementations is: -* Test datasets are specified in a custom human-readable, JSON-based format - designed exclusively for Arrow's integration tests -* Each implementation provides a testing executable capable of converting - between the JSON and the binary Arrow file representation -* Each testing executable is used to generate binary Arrow file representations - from the JSON-based test datasets. These results are then used to call the - testing executable of each other implementation to validate the contents - against the corresponding JSON file. - - *ie.* the C++ testing executable generates binary arrow files from JSON - specified datasets. The resulting files are then used as input to the Java - testing executable for validation, confirming that the Java implementation - can correctly read what the C++ implementation wrote. +* Test datasets are specified in a custom human-readable, + :ref:`JSON-based format ` designed exclusively + for Arrow's integration tests. + +* The JSON files are generated by the integration test harness. Different + files are used to represent different data types and features, such as + numerics, lists, dictionary encoding, etc. This makes it easier to pinpoint + incompatibilities than if all data types were represented in a single file. + +* Each implementation provides entry points capable of converting + between the JSON and the Arrow in-memory representation, and of exposing + Arrow in-memory data using the desired format. + +* Each format (whether Arrow IPC, Flight or the C Data Interface) is tested for + all supported pairs of (producer, consumer) implementations. The producer + typically reads a JSON file, converts it to in-memory Arrow data, and exposes + this data using the format under test. The consumer reads the data in the + said format and converts it back to Arrow in-memory data; it also reads + the same JSON file as the producer, and validates that both datasets are + identical. + +Example: IPC format +~~~~~~~~~~~~~~~~~~~ + +Let's say we are testing Arrow C++ as a producer and Arrow Java as a consumer +of the Arrow IPC format. Testing a JSON file would go as follows: + +#. A C++ executable reads the JSON file, converts it into Arrow in-memory data + and writes an Arrow IPC file (the file paths are typically given on the command + line). + +#. A Java executable reads the JSON file, converts it into Arrow in-memory data; + it also reads the Arrow IPC file generated by C++. Finally, it validates that + both Arrow in-memory datasets are equal. + +Example: C Data Interface +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Now, let's say we are testing Arrow Go as a producer and Arrow C# as a consumer +of the Arrow C Data Interface. + +#. The integration testing harness allocates a C + :ref:`ArrowArray ` structure on the heap. + +#. A Go in-process entrypoint (for example a C-compatible function call) + reads a JSON file and exports one of its :term:`record batches ` + into the ``ArrowArray`` structure. + +#. A C# in-process entrypoint reads the same JSON file, converts the + same record batch into Arrow in-memory data; it also imports the + record batch exported by Arrow Go in the ``ArrowArray`` structure. + It validates that both record batches are equal, and then releases the + imported record batch. + +#. Depending on the implementation languages' abilities, the integration + testing harness may assert that memory consumption remained identical + (i.e., that the exported record batch didn't leak). + +#. At the end, the integration testing harness deallocates the ``ArrowArray`` + structure. + +.. _running_integration_tests: Running integration tests ------------------------- The integration test data generator and runner are implemented inside -the :ref:`Archery ` utility. +the :ref:`Archery ` utility. You need to install the ``integration`` +component of archery: + +.. code:: console + + $ pip install -e "dev/archery[integration]" The integration tests are run using the ``archery integration`` command. -.. code-block:: shell +.. code-block:: console - archery integration --help + $ archery integration --help In order to run integration tests, you'll first need to build each component you want to include. See the respective developer docs for C++, Java, etc. @@ -56,26 +122,26 @@ testing. For C++, for example, you need to add ``-DARROW_BUILD_INTEGRATION=ON`` to your cmake command. Depending on which components you have built, you can enable and add them to -the archery test run. For example, if you only have the C++ project built, run: +the archery test run. For example, if you only have the C++ project built +and want to run the Arrow IPC integration tests, run: .. code-block:: shell - archery integration --with-cpp=1 - + archery integration --run-ipc --with-cpp=1 For Java, it may look like: .. code-block:: shell - VERSION=0.11.0-SNAPSHOT + VERSION=14.0.0-SNAPSHOT export ARROW_JAVA_INTEGRATION_JAR=$JAVA_DIR/tools/target/arrow-tools-$VERSION-jar-with-dependencies.jar - archery integration --with-cpp=1 --with-java=1 + archery integration --run-ipc --with-cpp=1 --with-java=1 -To run all tests, including Flight integration tests, do: +To run all tests, including Flight and C Data Interface integration tests, do: .. code-block:: shell - archery integration --with-all --run-flight + archery integration --with-all --run-flight --run-ipc --run-c-data Note that we run these tests in continuous integration, and the CI job uses docker-compose. You may also run the docker-compose job locally, or at least @@ -85,6 +151,8 @@ certain tests. See :ref:`docker-builds` for more information about the project's ``docker-compose`` configuration. +.. _format_json_integration: + JSON test data format --------------------- @@ -415,7 +483,7 @@ will have count 28. For "null" type, ``BufferData`` does not contain any buffers. Archery Integration Test Cases --------------------------------------- +------------------------------ This list can make it easier to understand what manual testing may need to be done for any future Arrow Format changes by knowing what cases the automated