Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to convert json objects to strings for nested data for the S3 and S3-Glue Airbyte connector #724

Open
5 tasks
quazi-h opened this issue Jun 8, 2023 · 1 comment

Comments

@quazi-h
Copy link
Contributor

quazi-h commented Jun 8, 2023

Description/Context

Airbyte has a representation of data types that all source connectors map their own types into, and which the destination connectors need to map from into their own types (reference). Nested objects in the source data are represented as an Object.

The S3 connector is relying on the upstream Apache Hive JsonSerDe library to handle serialization of JSON formatted data. By default, items being deserialized are expected to be wrapped in Hadoop Writable objects and objects being serialized are expected to be Java primitive objects.

The issue is that when nested documents are being serialized as an object, the data is escaped incorrectly and the resulting record in the Hive table is always {. We need to create a configurable option that enables the S3 and S3-Glue destination connectors to serialize any root level object as a string (including _airbyte_data).

Plan/Design

  • Add a way to specify whether to enable the "stringify" feature for S3 and S3-Glue destinations with JSON Lines output
  • Preserve existing functionality and prevent breaking current implementations by setting the default behavior to treat object types as object ( Options should be Default or Stringify)
  • Extend or refactor the existing configs to thread through the stringify argument
  • Update the JsonLSerializedBuffer logic to convert root level objects to strings with the given input parameter
  • Add tests as needed
@quazi-h
Copy link
Contributor Author

quazi-h commented Jun 8, 2023

AirbyteHQ (upstream) Issue: airbytehq/airbyte#27171

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant