You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Airbyte has a representation of data types that all source connectors map their own types into, and which the destination connectors need to map from into their own types (reference). Nested objects in the source data are represented as an Object.
The S3 connector is relying on the upstream Apache Hive JsonSerDe library to handle serialization of JSON formatted data. By default, items being deserialized are expected to be wrapped in Hadoop Writable objects and objects being serialized are expected to be Java primitive objects.
The issue is that when nested documents are being serialized as an object, the data is escaped incorrectly and the resulting record in the Hive table is always {. We need to create a configurable option that enables the S3 and S3-Glue destination connectors to serialize any root level object as a string (including _airbyte_data).
Plan/Design
Add a way to specify whether to enable the "stringify" feature for S3 and S3-Glue destinations with JSON Lines output
Preserve existing functionality and prevent breaking current implementations by setting the default behavior to treat object types as object ( Options should be Default or Stringify)
Extend or refactor the existing configs to thread through the stringify argument
Update the JsonLSerializedBuffer logic to convert root level objects to strings with the given input parameter
Add tests as needed
The text was updated successfully, but these errors were encountered:
Description/Context
Airbyte has a representation of data types that all source connectors map their own types into, and which the destination connectors need to map from into their own types (reference). Nested objects in the source data are represented as an
Object
.The S3 connector is relying on the upstream Apache Hive JsonSerDe library to handle serialization of JSON formatted data. By default, items being deserialized are expected to be wrapped in Hadoop Writable objects and objects being serialized are expected to be Java primitive objects.
The issue is that when nested documents are being serialized as an
object
, the data is escaped incorrectly and the resulting record in the Hive table is always{
. We need to create a configurable option that enables the S3 and S3-Glue destination connectors to serialize any root levelobject
as astring
(including _airbyte_data).Plan/Design
object
types asobject
( Options should beDefault
orStringify
)The text was updated successfully, but these errors were encountered: