Skip to content

Commit

Permalink
apacheGH-37934: [Doc][Integration] Document C Data Interface testing (a…
Browse files Browse the repository at this point in the history
…pache#37935)

### Rationale for this change

apachegh-37537 added integration testing for the C Data Interface, but the documentation was not updated.

### What changes are included in this PR?

Add documentation for C Data Interface integration testing.

### Are these changes tested?

N/A, only doc changes.

### Are there any user-facing changes?

No.
* Closes: apache#37934

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
  • Loading branch information
pitrou authored and dgreiss committed Feb 17, 2024
1 parent b87cbb2 commit daf2d13
Show file tree
Hide file tree
Showing 2 changed files with 97 additions and 27 deletions.
10 changes: 6 additions & 4 deletions docs/source/developers/java/development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -84,11 +84,13 @@ UI Benchmark:
Integration Testing
===================

Integration tests can be run via Archery:
Integration tests can be run :ref:`via Archery <running_integration_tests>`.
For example, assuming you only built Arrow Java and want to run the IPC
integration tests, you would do:

.. code-block::
.. code-block:: console
$ archery integration --with-java true --with-cpp false --with-js false --with-csharp false --with-go false --with-rust false
$ archery integration --run-ipc --with-java 1
Code Style
==========
Expand All @@ -104,4 +106,4 @@ This checks the code style of all source code under the current directory or fro
.. _benchmark: https://github.com/ursacomputing/benchmarks
.. _archery: https://github.com/apache/arrow/blob/main/dev/conbench_envs/README.md#L188
.. _conbench: https://github.com/conbench/conbench
.. _checkstyle: https://github.com/apache/arrow/blob/main/java/dev/checkstyle/checkstyle.xml
.. _checkstyle: https://github.com/apache/arrow/blob/main/java/dev/checkstyle/checkstyle.xml
114 changes: 91 additions & 23 deletions docs/source/format/Integration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,32 +20,98 @@
Integration Testing
===================

To ensure Arrow implementations are interoperable between each other,
the Arrow project includes cross-language integration tests which are
regularly run as Continuous Integration tasks.

The integration tests exercise compliance with several Arrow specifications:
the :ref:`IPC format <format-ipc>`, the :ref:`Flight RPC <flight-rpc>` protocol,
and the :ref:`C Data Interface <c-data-interface>`.

Strategy
--------

Our strategy for integration testing between Arrow implementations is:

* Test datasets are specified in a custom human-readable, JSON-based format
designed exclusively for Arrow's integration tests
* Each implementation provides a testing executable capable of converting
between the JSON and the binary Arrow file representation
* Each testing executable is used to generate binary Arrow file representations
from the JSON-based test datasets. These results are then used to call the
testing executable of each other implementation to validate the contents
against the corresponding JSON file.
- *ie.* the C++ testing executable generates binary arrow files from JSON
specified datasets. The resulting files are then used as input to the Java
testing executable for validation, confirming that the Java implementation
can correctly read what the C++ implementation wrote.
* Test datasets are specified in a custom human-readable,
:ref:`JSON-based format <format_json_integration>` designed exclusively
for Arrow's integration tests.

* The JSON files are generated by the integration test harness. Different
files are used to represent different data types and features, such as
numerics, lists, dictionary encoding, etc. This makes it easier to pinpoint
incompatibilities than if all data types were represented in a single file.

* Each implementation provides entry points capable of converting
between the JSON and the Arrow in-memory representation, and of exposing
Arrow in-memory data using the desired format.

* Each format (whether Arrow IPC, Flight or the C Data Interface) is tested for
all supported pairs of (producer, consumer) implementations. The producer
typically reads a JSON file, converts it to in-memory Arrow data, and exposes
this data using the format under test. The consumer reads the data in the
said format and converts it back to Arrow in-memory data; it also reads
the same JSON file as the producer, and validates that both datasets are
identical.

Example: IPC format
~~~~~~~~~~~~~~~~~~~

Let's say we are testing Arrow C++ as a producer and Arrow Java as a consumer
of the Arrow IPC format. Testing a JSON file would go as follows:

#. A C++ executable reads the JSON file, converts it into Arrow in-memory data
and writes an Arrow IPC file (the file paths are typically given on the command
line).

#. A Java executable reads the JSON file, converts it into Arrow in-memory data;
it also reads the Arrow IPC file generated by C++. Finally, it validates that
both Arrow in-memory datasets are equal.

Example: C Data Interface
~~~~~~~~~~~~~~~~~~~~~~~~~

Now, let's say we are testing Arrow Go as a producer and Arrow C# as a consumer
of the Arrow C Data Interface.

#. The integration testing harness allocates a C
:ref:`ArrowArray <c-data-interface-struct-defs>` structure on the heap.

#. A Go in-process entrypoint (for example a C-compatible function call)
reads a JSON file and exports one of its :term:`record batches <record batch>`
into the ``ArrowArray`` structure.

#. A C# in-process entrypoint reads the same JSON file, converts the
same record batch into Arrow in-memory data; it also imports the
record batch exported by Arrow Go in the ``ArrowArray`` structure.
It validates that both record batches are equal, and then releases the
imported record batch.

#. Depending on the implementation languages' abilities, the integration
testing harness may assert that memory consumption remained identical
(i.e., that the exported record batch didn't leak).

#. At the end, the integration testing harness deallocates the ``ArrowArray``
structure.

.. _running_integration_tests:

Running integration tests
-------------------------

The integration test data generator and runner are implemented inside
the :ref:`Archery <archery>` utility.
the :ref:`Archery <archery>` utility. You need to install the ``integration``
component of archery:

.. code:: console
$ pip install -e "dev/archery[integration]"
The integration tests are run using the ``archery integration`` command.

.. code-block:: shell
.. code-block:: console
archery integration --help
$ archery integration --help
In order to run integration tests, you'll first need to build each component
you want to include. See the respective developer docs for C++, Java, etc.
Expand All @@ -56,26 +122,26 @@ testing. For C++, for example, you need to add ``-DARROW_BUILD_INTEGRATION=ON``
to your cmake command.

Depending on which components you have built, you can enable and add them to
the archery test run. For example, if you only have the C++ project built, run:
the archery test run. For example, if you only have the C++ project built
and want to run the Arrow IPC integration tests, run:

.. code-block:: shell
archery integration --with-cpp=1
archery integration --run-ipc --with-cpp=1
For Java, it may look like:

.. code-block:: shell
VERSION=0.11.0-SNAPSHOT
VERSION=14.0.0-SNAPSHOT
export ARROW_JAVA_INTEGRATION_JAR=$JAVA_DIR/tools/target/arrow-tools-$VERSION-jar-with-dependencies.jar
archery integration --with-cpp=1 --with-java=1
archery integration --run-ipc --with-cpp=1 --with-java=1
To run all tests, including Flight integration tests, do:
To run all tests, including Flight and C Data Interface integration tests, do:

.. code-block:: shell
archery integration --with-all --run-flight
archery integration --with-all --run-flight --run-ipc --run-c-data
Note that we run these tests in continuous integration, and the CI job uses
docker-compose. You may also run the docker-compose job locally, or at least
Expand All @@ -85,6 +151,8 @@ certain tests.
See :ref:`docker-builds` for more information about the project's
``docker-compose`` configuration.

.. _format_json_integration:

JSON test data format
---------------------

Expand Down Expand Up @@ -415,7 +483,7 @@ will have count 28.
For "null" type, ``BufferData`` does not contain any buffers.

Archery Integration Test Cases
--------------------------------------
------------------------------

This list can make it easier to understand what manual testing may need to
be done for any future Arrow Format changes by knowing what cases the automated
Expand Down

0 comments on commit daf2d13

Please sign in to comment.