Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37934: [Doc][Integration] Document C Data Interface testing #37935

Merged
merged 2 commits into from
Sep 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions docs/source/developers/java/development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -84,11 +84,13 @@ UI Benchmark:
Integration Testing
===================

Integration tests can be run via Archery:
Integration tests can be run :ref:`via Archery <running_integration_tests>`.
For example, assuming you only built Arrow Java and want to run the IPC
integration tests, you would do:

.. code-block::
.. code-block:: console

$ archery integration --with-java true --with-cpp false --with-js false --with-csharp false --with-go false --with-rust false
$ archery integration --run-ipc --with-java 1

Code Style
==========
Expand All @@ -104,4 +106,4 @@ This checks the code style of all source code under the current directory or fro
.. _benchmark: https://github.com/ursacomputing/benchmarks
.. _archery: https://github.com/apache/arrow/blob/main/dev/conbench_envs/README.md#L188
.. _conbench: https://github.com/conbench/conbench
.. _checkstyle: https://github.com/apache/arrow/blob/main/java/dev/checkstyle/checkstyle.xml
.. _checkstyle: https://github.com/apache/arrow/blob/main/java/dev/checkstyle/checkstyle.xml
114 changes: 91 additions & 23 deletions docs/source/format/Integration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,32 +20,98 @@
Integration Testing
===================

To ensure Arrow implementations are interoperable between each other,
the Arrow project includes cross-language integration tests which are
regularly run as Continuous Integration tasks.

The integration tests exercise compliance with several Arrow specifications:
the :ref:`IPC format <format-ipc>`, the :ref:`Flight RPC <flight-rpc>` protocol,
and the :ref:`C Data Interface <c-data-interface>`.

Strategy
--------

Our strategy for integration testing between Arrow implementations is:

* Test datasets are specified in a custom human-readable, JSON-based format
designed exclusively for Arrow's integration tests
* Each implementation provides a testing executable capable of converting
between the JSON and the binary Arrow file representation
* Each testing executable is used to generate binary Arrow file representations
from the JSON-based test datasets. These results are then used to call the
testing executable of each other implementation to validate the contents
against the corresponding JSON file.
- *ie.* the C++ testing executable generates binary arrow files from JSON
specified datasets. The resulting files are then used as input to the Java
testing executable for validation, confirming that the Java implementation
can correctly read what the C++ implementation wrote.
* Test datasets are specified in a custom human-readable,
:ref:`JSON-based format <format_json_integration>` designed exclusively
for Arrow's integration tests.

* The JSON files are generated by the integration test harness. Different
files are used to represent different data types and features, such as
numerics, lists, dictionary encoding, etc. This makes it easier to pinpoint
incompatibilities than if all data types were represented in a single file.

* Each implementation provides entry points capable of converting
between the JSON and the Arrow in-memory representation, and of exposing
Arrow in-memory data using the desired format.

* Each format (whether Arrow IPC, Flight or the C Data Interface) is tested for
all supported pairs of (producer, consumer) implementations. The producer
typically reads a JSON file, converts it to in-memory Arrow data, and exposes
this data using the format under test. The consumer reads the data in the
said format and converts it back to Arrow in-memory data; it also reads
the same JSON file as the producer, and validates that both datasets are
identical.

Example: IPC format
~~~~~~~~~~~~~~~~~~~

Let's say we are testing Arrow C++ as a producer and Arrow Java as a consumer
of the Arrow IPC format. Testing a JSON file would go as follows:
pitrou marked this conversation as resolved.
Show resolved Hide resolved

#. A C++ executable reads the JSON file, converts it into Arrow in-memory data
and writes an Arrow IPC file (the file paths are typically given on the command
line).

#. A Java executable reads the JSON file, converts it into Arrow in-memory data;
it also reads the Arrow IPC file generated by C++. Finally, it validates that
both Arrow in-memory datasets are equal.

Example: C Data Interface
~~~~~~~~~~~~~~~~~~~~~~~~~

Now, let's say we are testing Arrow Go as a producer and Arrow C# as a consumer
of the Arrow C Data Interface.

#. The integration testing harness allocates a C
:ref:`ArrowArray <c-data-interface-struct-defs>` structure on the heap.

#. A Go in-process entrypoint (for example a C-compatible function call)
reads a JSON file and exports one of its :term:`record batches <record batch>`
into the ``ArrowArray`` structure.

#. A C# in-process entrypoint reads the same JSON file, converts the
same record batch into Arrow in-memory data; it also imports the
record batch exported by Arrow Go in the ``ArrowArray`` structure.
It validates that both record batches are equal, and then releases the
imported record batch.

#. Depending on the implementation languages' abilities, the integration
testing harness may assert that memory consumption remained identical
(i.e., that the exported record batch didn't leak).

#. At the end, the integration testing harness deallocates the ``ArrowArray``
structure.

.. _running_integration_tests:

Running integration tests
-------------------------

The integration test data generator and runner are implemented inside
the :ref:`Archery <archery>` utility.
the :ref:`Archery <archery>` utility. You need to install the ``integration``
component of archery:

.. code:: console

$ pip install -e "dev/archery[integration]"

The integration tests are run using the ``archery integration`` command.

.. code-block:: shell
.. code-block:: console

archery integration --help
$ archery integration --help

In order to run integration tests, you'll first need to build each component
you want to include. See the respective developer docs for C++, Java, etc.
Expand All @@ -56,26 +122,26 @@ testing. For C++, for example, you need to add ``-DARROW_BUILD_INTEGRATION=ON``
to your cmake command.

Depending on which components you have built, you can enable and add them to
the archery test run. For example, if you only have the C++ project built, run:
the archery test run. For example, if you only have the C++ project built
and want to run the Arrow IPC integration tests, run:

.. code-block:: shell

archery integration --with-cpp=1

archery integration --run-ipc --with-cpp=1

For Java, it may look like:

.. code-block:: shell

VERSION=0.11.0-SNAPSHOT
VERSION=14.0.0-SNAPSHOT
export ARROW_JAVA_INTEGRATION_JAR=$JAVA_DIR/tools/target/arrow-tools-$VERSION-jar-with-dependencies.jar
archery integration --with-cpp=1 --with-java=1
archery integration --run-ipc --with-cpp=1 --with-java=1

To run all tests, including Flight integration tests, do:
To run all tests, including Flight and C Data Interface integration tests, do:

.. code-block:: shell

archery integration --with-all --run-flight
archery integration --with-all --run-flight --run-ipc --run-c-data

Note that we run these tests in continuous integration, and the CI job uses
docker-compose. You may also run the docker-compose job locally, or at least
Expand All @@ -85,6 +151,8 @@ certain tests.
See :ref:`docker-builds` for more information about the project's
``docker-compose`` configuration.

.. _format_json_integration:

JSON test data format
---------------------

Expand Down Expand Up @@ -415,7 +483,7 @@ will have count 28.
For "null" type, ``BufferData`` does not contain any buffers.

Archery Integration Test Cases
--------------------------------------
------------------------------

This list can make it easier to understand what manual testing may need to
be done for any future Arrow Format changes by knowing what cases the automated
Expand Down