Skip to content

SQL Functions

vnnv01 edited this page Mar 6, 2018 · 2 revisions

Submit a feature request if there are additional SQL functions you would like.


PSTL is built with Apache Spark, which comes with a comprehensive set of SQL functions. However, PSTL provides additional SQL functions in an attempt to improve user productivity when defining non-trivial jobs. Listed below are additional SQL functions provided by PSTL for use within your job definitions.

from_avro

Converts avro encoded bytes to a struct, based on the provided avro schema. Users can provide their avro schema using a variety of options:

TODO: cross link supported mappings / data types

option description
schema.class Specifies a fully qualified class name of an avro generated class to read the avro schema from. Users must ensure the avro class is available at runtime by pre-distributing jars with their avro classes, or adding the jars with their avro classes at runtime.
schema.file Avro schema should be read from the file specified at schema.file. When using schema.file, from_avro will only attempt to load the file by searching for the file on the driver. If schema.file is relative, the driver's working directory will be searched first. If schema.file is not found in the driver's working directory, Spark's file server will be searched.
schema.literal Avro schema should be read as a JSON schema from this argument.
schema.uri Avro schema should be read from the file specified at schema.uri. When using schema.uri, from_avro will attempt to load the file using HDFS' FileContext (or FileSystem if FileContext is not available). This means you can specify URIs such as hdfs:///, s3:///, etc. assuming the underlying Hadoop configurations are properly set for the schema.

schema.class:

SELECT from_avro(bytesCol, map('schema.class', 'com.acme.avro.Product'));

schema.file:

SELECT from_avro(bytesCol, map('schema.file', 'product.avsc'));

schema.literal:

SELECT from_avro(bytesCol, map('schema.literal', '{"type":"record","name":"Product","namespace":"com.acme.avro","fields":[...]}'));

schema.uri:

SELECT from_avro(bytesCol, map('schema.uri', 'hdfs:///path/to/schemas/product.avsc'));