-
Notifications
You must be signed in to change notification settings - Fork 6
ORC Sink
WARNING: Spark's ORC integration has generally lagged the robustness of Parquet integration. As such, the current state of affairs of the ORC Sink does not allow configuring custom a OutputFormat
, etc. as you may be used to with the Parquet Sink.
The ORC Sink is a specific subset of the File Sink. As such, file sink options may also be configured on an ORC Sink. As always, the path
option of the File Sink must be configured. Please review the File Sink if you are not already familiar with it.
The compression codec to use when generating ORC files. If not specified directly on the ORC Sink, orc.compress
(e.g., spark.hadoop.orc.compress
) will be used. Valid values include:
- none
- uncompressed
- snappy
- zlib
- lzo
Defaults to snappy
.
SAVE STREAM foo
TO ORC
OPTIONS(
'compression'='zlib'
);
Allows specifying internal settings for the underlying ORC file writers. Typically, users will only modify these settings for use cases requiring fine grained tuning. ORC exposes a compression setting, but users should prefer the compression option exposed directly by the ORC Sink.
If you are unfamiliar with these settings, you can use OrcConf as a reference.
-- spark.properties: spark.hadoop.orc.memory.pool=0.1
SAVE STREAM foo
TO ORC
OPTIONS();