You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unfenced functions are functions which execute within the same process as the underlying evaluation engine. Typically, this term is used in the context of user-defined functions, and it is meant to contrast with fenced functions, which execute in a separate process. Unfenced functions are more efficient than fenced functions, but they are also more dangerous, as they can crash the process in which they are running. Presto has supported unfenced functions for a very long time, and they are a convenient way to efficiently add new functions in Presto.
Currently, the Presto analyzer only validates functions that are registered through the Presto functions SPI, and also built-in functions which are implemented inJava. C++ functions are not validated at query planning time, which can lead to runtime errors if the function is not implemented correctly or if the function only exists in C++.
Expected Behavior or Use Case
To support RFC-0003, and to enable an equivalent SPI that allows for the registration of built-in functions written in C++, we need enhance Presto's SPI to allow for the registration of functions which are not present in-memory in the Java runtime. This issue is to create a new SPI which will allow out of process, yet built-in, functions to be registered to the Presto analyzer, which will allow these functions to be planned in the same way as existing Java functions, and also to quickly validate incorrect usage of these functions.
#22829 added a new sidecar process type to Presto. This sidecar is a separate process which shares the same code as the Presto C++ worker.
A new endpoint will be added to the Presto sidecar which will return the function mapping for all built-in and externally registered functions which are implemented in C++. This will allow the Presto analyzer to validate the function signatures of these functions at query planning time.
(Currently, there is no way to add new functions to the C++ engine without forking Prestissimo. However, a separate issue will be created which will enable registering such functions as an externally loaded shared library. Once this feature is enabled, then we can consider this to be a new SPI which allows for the registration of functions which are not present in-memory in the Java runtime or built-in to the Presto C++ engine.)
An enhancement to the SPI will be added in the Java codebase which will allow for the registration of functions which are not present in-memory in the Java runtime. This SPI will be used to register the functions which are returned by the sidecar process. This SPI is the same as the existing FunctionNamespaceManager SPI, with some important additions:
Currently, there is no support in this SPI for registering functions which take in parametric types. This support will be added.
Currently, there is no support in this SPI for registering functions which take in a variable number of arguments. This support will be added.
Currently, built-in functions are hardcoded to refer to an in-memory list of Java implemented functions. Built-in functions are just like other functions, except they don't require a namespace to refer to them (e.g., instead of typing presto.default.sum(x), where presto.default is the namespace of the function sum, built-in functions can simply be referred to as sum(x)). To address this, the FunctionNamespaceManager will be enhanced to allow for itself to be marked as the default namespace. Only a single namespace may be marked as providing the built-in namespace. When it is marked as a default namespace, then built-in functions can be redirected to a different namespace (e.g., instead of presto.default, presto.native.sum(x) may be referenced as sum(x)).
A new module will be developed which has the sole purpose of retrieving information from the Presto sidecar process. A new FunctionNamespaceManager will be added there which will retrieve the function mapping from the sidecar process and cache it in-memory. This module will be responsible for registering the functions which are returned by the sidecar process.
Because this functionality will be enabled through an SPI, use of it will be voluntary. However, it is expected that this will eventually be used by all Presto installations which have C++ functions, as it will allow for the validation of these functions at query planning time.
Background
Unfenced functions are functions which execute within the same process as the underlying evaluation engine. Typically, this term is used in the context of user-defined functions, and it is meant to contrast with fenced functions, which execute in a separate process. Unfenced functions are more efficient than fenced functions, but they are also more dangerous, as they can crash the process in which they are running. Presto has supported unfenced functions for a very long time, and they are a convenient way to efficiently add new functions in Presto.
Currently, the Presto analyzer only validates functions that are registered through the Presto functions SPI, and also built-in functions which are implemented inJava. C++ functions are not validated at query planning time, which can lead to runtime errors if the function is not implemented correctly or if the function only exists in C++.
Expected Behavior or Use Case
To support RFC-0003, and to enable an equivalent SPI that allows for the registration of built-in functions written in C++, we need enhance Presto's SPI to allow for the registration of functions which are not present in-memory in the Java runtime. This issue is to create a new SPI which will allow out of process, yet built-in, functions to be registered to the Presto analyzer, which will allow these functions to be planned in the same way as existing Java functions, and also to quickly validate incorrect usage of these functions.
Presto Component, Service, or Connector
Presto SPI, Presto Sidecar, Native execution module
Possible Implementation
#22829 added a new sidecar process type to Presto. This sidecar is a separate process which shares the same code as the Presto C++ worker.
A new endpoint will be added to the Presto sidecar which will return the function mapping for all built-in and externally registered functions which are implemented in C++. This will allow the Presto analyzer to validate the function signatures of these functions at query planning time.
(Currently, there is no way to add new functions to the C++ engine without forking Prestissimo. However, a separate issue will be created which will enable registering such functions as an externally loaded shared library. Once this feature is enabled, then we can consider this to be a new SPI which allows for the registration of functions which are not present in-memory in the Java runtime or built-in to the Presto C++ engine.)
An enhancement to the SPI will be added in the Java codebase which will allow for the registration of functions which are not present in-memory in the Java runtime. This SPI will be used to register the functions which are returned by the sidecar process. This SPI is the same as the existing
FunctionNamespaceManager
SPI, with some important additions:presto.default.sum(x)
, wherepresto.default
is the namespace of the functionsum
, built-in functions can simply be referred to assum(x)
). To address this, theFunctionNamespaceManager
will be enhanced to allow for itself to be marked as the default namespace. Only a single namespace may be marked as providing the built-in namespace. When it is marked as a default namespace, then built-in functions can be redirected to a different namespace (e.g., instead ofpresto.default
,presto.native.sum(x)
may be referenced assum(x)
).A new module will be developed which has the sole purpose of retrieving information from the Presto sidecar process. A new
FunctionNamespaceManager
will be added there which will retrieve the function mapping from the sidecar process and cache it in-memory. This module will be responsible for registering the functions which are returned by the sidecar process.Because this functionality will be enabled through an SPI, use of it will be voluntary. However, it is expected that this will eventually be used by all Presto installations which have C++ functions, as it will allow for the validation of these functions at query planning time.
Example Screenshots (if appropriate):
Context
RFC-0003
The text was updated successfully, but these errors were encountered: