Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination S3, S3-Glue: Add option to convert objects to strings #27171

Open
5 tasks
quazi-h opened this issue Jun 8, 2023 · 0 comments
Open
5 tasks

Destination S3, S3-Glue: Add option to convert objects to strings #27171

quazi-h opened this issue Jun 8, 2023 · 0 comments

Comments

@quazi-h
Copy link

quazi-h commented Jun 8, 2023

What area the feature impact?

Connectors

Revelant Information

Description/Context

Airbyte has a representation of data types that all source connectors map their own types into, and which the destination connectors need to map from into their own types (reference). Nested objects in the source data are represented as an Object.

The S3 and S3-Glue connectors are relying on the upstream Apache Hive JsonSerDe library to handle serialization of JSON formatted data. By default, items being deserialized are expected to be wrapped in Hadoop Writable objects and objects being serialized are expected to be Java primitive objects.

The issue is that when nested documents are being serialized as an object, the data is escaped incorrectly and the resulting record in the Hive table is always {. We need to create a configurable option that enables the S3 and S3-Glue destination connectors to serialize any root level object as a string (including _airbyte_data).

Plan/Design

  • Add a way to specify whether to enable the "stringify" feature for S3 and S3-Glue destinations with JSON Lines output
  • Preserve existing functionality and prevent breaking current implementations by setting the default behavior to treat object types as object ( Options should be Default or Stringify)
  • Extend or refactor the existing configs to thread through the stringify argument
  • Update the JsonLSerializedBuffer logic to convert root level objects to strings with the given input parameter
  • Add tests as needed
@quazi-h quazi-h added needs-triage type/enhancement New feature or request labels Jun 8, 2023
@igrankova igrankova changed the title Add option to convert objects to strings for S3 and S3-Glue destination connectors Destination S3, S3-Glue: Add option to convert objects to strings Jul 17, 2023
@bleonard bleonard added the frozen Not being actively worked on label Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants