-
Notifications
You must be signed in to change notification settings - Fork 6
SQL Functions
vnnv01 edited this page Mar 6, 2018
·
2 revisions
Submit a feature request if there are additional SQL functions you would like.
PSTL is built with Apache Spark, which comes with a comprehensive set of SQL functions. However, PSTL provides additional SQL functions in an attempt to improve user productivity when defining non-trivial jobs. Listed below are additional SQL functions provided by PSTL for use within your job definitions.
Converts avro encoded bytes to a struct, based on the provided avro schema. Users can provide their avro schema using a variety of options:
TODO: cross link supported mappings / data types
option | description |
---|---|
schema.class |
Specifies a fully qualified class name of an avro generated class to read the avro schema from. Users must ensure the avro class is available at runtime by pre-distributing jars with their avro classes, or adding the jars with their avro classes at runtime. |
schema.file |
Avro schema should be read from the file specified at schema.file . When using schema.file , from_avro will only attempt to load the file by searching for the file on the driver. If schema.file is relative, the driver's working directory will be searched first. If schema.file is not found in the driver's working directory, Spark's file server will be searched. |
schema.literal |
Avro schema should be read as a JSON schema from this argument. |
schema.uri |
Avro schema should be read from the file specified at schema.uri . When using schema.uri , from_avro will attempt to load the file using HDFS' FileContext (or FileSystem if FileContext is not available). This means you can specify URIs such as hdfs:/// , s3:/// , etc. assuming the underlying Hadoop configurations are properly set for the schema. |
schema.class
:
SELECT from_avro(bytesCol, map('schema.class', 'com.acme.avro.Product'));
schema.file
:
SELECT from_avro(bytesCol, map('schema.file', 'product.avsc'));
schema.literal
:
SELECT from_avro(bytesCol, map('schema.literal', '{"type":"record","name":"Product","namespace":"com.acme.avro","fields":[...]}'));
schema.uri
:
SELECT from_avro(bytesCol, map('schema.uri', 'hdfs:///path/to/schemas/product.avsc'));