Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Restructure and expand presto_cpp docs #22717

Merged
merged 1 commit into from
May 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion presto-docs/src/main/sphinx/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Presto Documentation
ecosystem
router
develop
prestissimo
presto-cpp
release

.. Note: If "release" is not the last item, the CSS must be updated.
25 changes: 0 additions & 25 deletions presto-docs/src/main/sphinx/prestissimo.rst

This file was deleted.

This file was deleted.

This file was deleted.

51 changes: 51 additions & 0 deletions presto-docs/src/main/sphinx/presto-cpp.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
**********
Presto C++
**********

Note: Presto C++ is in active development. See :doc:`Limitations </presto_cpp/limitations>`.

.. toctree::
:maxdepth: 1

presto_cpp/features
presto_cpp/limitations

Overview
========

Presto C++, sometimes referred to by the development name Prestissimo, is a
drop-in replacement for Presto workers written in C++ and based on the
`Velox <https://velox-lib.io/>`_ library.
It implements the same RESTful endpoints as Java workers using the Proxygen C++
HTTP framework.
Because communication with the Java coordinator and across workers is only
done using the REST endpoints, Presto C++ does not use JNI and does not
require a JVM on worker nodes.

Presto C++'s codebase is located at `presto-native-execution
<https://github.com/prestodb/presto/tree/master/presto-native-execution>`_.

Motivation and Vision
=====================

Presto aims to be the top performing system for data lakes.
To achieve this goal, the Presto community is moving the Presto
evaluation engine from the native Java-based implementation to a new
implementation written in C++ using `Velox <https://velox-lib.io/>`_.

By moving the evaluation engine to a library, the intent is to enable the
Presto community to focus on more features and better integration with table
formats and other data warehousing systems.

Supported Use Cases
majetideepak marked this conversation as resolved.
Show resolved Hide resolved
===================

Only specific connectors are supported in the Presto C++ evaluation engine.
steveburnett marked this conversation as resolved.
Show resolved Hide resolved

* Hive connector for reads and writes, including CTAS, are supported.
steveburnett marked this conversation as resolved.
Show resolved Hide resolved

* Iceberg tables are supported only for reads.

* Iceberg connector supports both V1 and V2 tables, including tables with delete files.

* TPCH connector, with ``tpch.naming=standard`` catalog property.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
====================
Prestissimo Features
====================
===================
Presto C++ Features
===================

.. contents::
:local:
Expand All @@ -27,17 +27,17 @@ Other HTTP endpoints include:
* GET: v1/info
* GET: v1/status

The request/response flow of Prestissimo is identical to Java workers. The
The request/response flow of Presto C++ is identical to Java workers. The
tasks or new splits are registered via `TaskUpdateRequest`. Resource
utilization and query progress are sent to the coordinator via task endpoints.


Remote Function Execution
-------------------------

Prestissimo supports remote execution of scalar functions. This feature is
Presto C++ supports remote execution of scalar functions. This feature is
useful for cases when the function code is not written in C++, or if for
security or flexibility reasons the function code cannot be linked to the same
steveburnett marked this conversation as resolved.
Show resolved Hide resolved
security or flexibility reasons, the function code cannot be linked to the same
executable as the main engine.

Remote function signatures need to be provided using a JSON file, following
Expand Down Expand Up @@ -114,7 +114,7 @@ function server. If specified, takes precedence over
JWT authentication support
--------------------------

Prestissimo supports JWT authentication for internal communication.
C++ based Presto supports JWT authentication for internal communication.
For details on the generally supported parameters visit `JWT <../security/internal-communication.html#jwt>`_.

There is also an additional parameter:
Expand Down Expand Up @@ -169,9 +169,9 @@ Size of the SSD cache when async data cache is enabled.
* **Default value:** ``true``
* **Presto on Spark default value:** ``false``

Enable periodic clean up of old tasks. This is ``true`` for Prestissimo,
however for Presto on Spark this defaults to ``false`` as zombie/stuck tasks
are handled by spark via speculative execution.
Enable periodic clean up of old tasks. The default value is ``true`` for Presto C++.
For Presto on Spark this property defaults to ``false``, as zombie or stuck tasks
are handled by Spark by speculative execution.

``old-task-cleanup-ms``
^^^^^^^^^^^^^^^^^^^^^^^
Expand All @@ -189,7 +189,7 @@ Old task is defined as a PrestoTask which has not received heartbeat for at leas
Session Properties
------------------

The following are the native session properties for Prestissimo.
The following are the native session properties for C++ based Presto.

``driver_cpu_time_slice_limit_ms``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
44 changes: 44 additions & 0 deletions presto-docs/src/main/sphinx/presto_cpp/limitations.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
======================
Presto C++ Limitations
======================

.. contents::
:local:
:backlinks: none
:depth: 1

General Limitations
===================

The C++ evaluation engine has a number of limitations:

* Not all built-in functions are implemented in C++. Attempting to use unimplemented functions results in a query failure. For supported functions, see `Function Coverage <https://facebookincubator.github.io/velox/functions/presto/coverage.html>`_.

* Not all built-in types are implemented in C++. Attempting to use unimplemented types will result in a query failure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveburnett : Apologize for the delay. I feel we should add more detail here about the supported and unsupported types here.

The specifics are :
All basic/structured types (from https://prestodb.io/docs/0.286/language/types.html) but [CHAR, TIME, TIME WITH TIMEZONE] are supported. These are subsumed by VARCHAR, TIMESTAMP and TIMESTAMP WITH TIMEZONE.

Caveat : Prestissimo only supports unlimited length Varchar. It does not honor the lengths in varchar[n].

Types IPADDRESS, IPPREFIX, UUID, kHYPERLOGLOG, P4HYPERLOGLOG, QDIGEST, TDIGEST are not supported

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #22772 to add this content to the docs.


* Certain parts of the plugin SPI are not used by the C++ evaluation engine. In particular, C++ workers will not load any plugin in the plugins directory, and certain plugin types are either partially or completely unsupported.

* ``PageSourceProvider``, ``RecordSetProvider``, and ``PageSinkProvider`` do not work in the C++ evaluation engine.

* User-supplied functions, types, parametric types and block encodings are not supported.

* The event listener plugin does not work at the split level.

* User-defined functions do not work in the same way, see `Remote Function Execution <features.html#remote-function-execution>`_.

* Memory management works differently in the C++ evaluation engine. In particular:

* The OOM killer is not supported.
* The reserved pool is not supported.
* In general, queries may use more memory than they are allowed to through memory arbitration. See `Memory Management <https://facebookincubator.github.io/velox/develop/memory.html>`_.

Functions
=========

reduce_agg
----------

In C++ based Presto, ``reduce_agg`` is not permitted to return ``null`` in either the
``inputFunction`` or the ``combineFunction``. In Presto (Java), this is permitted
but undefined behavior. For more information about ``reduce_agg`` in Presto,
see `reduce_agg <../functions/aggregate.html#reduce_agg>`_.
Loading