NullPointerException while sync when source-s3 CSV -> destination-s3 PARQUET #6871

amorskoy · 2021-10-07T08:46:54Z

Enviroment

Airbyte version: fresh master, commit 11645689431a69c689a15b620e4a2b6bc7b045c3
OS Version / Instance: Ubuntu 18.04
Deployment: Docker
Source Connector and version: source-s3:0.1.5
Destination Connector and version: destination-s3:0.1.12
Severity: Critical
Step where error happened: Sync job

Current Behavior

I have small CSV file on S3: 3.7MB, 4k rows x 150 columns generated by Python Faker lib.
File is attached
sample_synth_4K_150.csv

I want to save it on S3 as Parquet file using destination-s3.
I get java.lang.NullPointerException for io.airbyte.integrations.destination.s3.avro.JsonToAvroSchemaConverter.getAvroSchema(JsonToAvroSchemaConverter.java:139)

Expected Behavior

Sync should infer schema correctly and finish correctly with Parquet file output

Logs

Please see attached
logs-2-0.txt

The text was updated successfully, but these errors were encountered:

tuliren · 2021-10-15T20:24:23Z

The line 139 in JsonToAvroSchemaConverter is actually this line:
https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-s3/src/main/java/io/airbyte/integrations/destination/s3/avro/JsonToAvroSchemaConverter.java#L119

The line number is wrong due to the license update.

This means the json schema passed into the s3 destination misses a properties field.

Still investigating.

tuliren · 2021-10-16T09:05:05Z

@Phlair, is it possible that the json schema generated by the s3 source misses the properties field for some objects?

Looks like this line can be the root cause?

https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/stream.py#L184

Phlair · 2021-10-18T11:21:45Z

@tuliren the self.ab_additional_col field is where any additional column/values over time are put to keep the schema consistent. In that sense it has no defined properties but has additionalProperties as the default True so it can hold anything. Does that cause problems with parquet/avro destination because of the typing?

tuliren · 2021-10-18T17:02:50Z

I see. Avro schema requires a definitive type for each field. Currently our Avro to Json schema converter does not support additionalProperties yet. So this should be the root cause of this NPE.

VitaliiMaltsev · 2021-12-01T11:30:45Z

Can not reproduce at the moment
I believe this issue was fixed in scope of #7288
@tuliren please verify

VitaliiMaltsev · 2021-12-07T08:53:19Z

@tuliren can we close this issue?

tuliren · 2021-12-07T23:15:46Z

Yes, we can close. Sorry that I missed your comment.

amorskoy added the type/bug Something isn't working label Oct 7, 2021

octavia-squidington-iii added the community label Oct 7, 2021

sherifnada added the area/connectors Connector related issues label Oct 7, 2021

sherifnada added the lang/java label Nov 19, 2021

VitaliiMaltsev self-assigned this Dec 1, 2021

tuliren closed this as completed Dec 7, 2021

sherifnada mentioned this issue Dec 17, 2021

More robust data type & schema permutations testing in Destination Acceptance Tests #8862

Closed

sherifnada added this to GL Roadmap Jan 12, 2022

sherifnada moved this to Done in GL Roadmap Jan 12, 2022

igrankova added connectors/sources-files connectors/source/s3 connectors/destinations-files connectors/destination/s3 labels Jan 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NullPointerException while sync when source-s3 CSV -> destination-s3 PARQUET #6871

NullPointerException while sync when source-s3 CSV -> destination-s3 PARQUET #6871

amorskoy commented Oct 7, 2021

tuliren commented Oct 15, 2021

tuliren commented Oct 16, 2021 •

edited

Loading

Phlair commented Oct 18, 2021

tuliren commented Oct 18, 2021

VitaliiMaltsev commented Dec 1, 2021

VitaliiMaltsev commented Dec 7, 2021

tuliren commented Dec 7, 2021

NullPointerException while sync when source-s3 CSV -> destination-s3 PARQUET #6871

NullPointerException while sync when source-s3 CSV -> destination-s3 PARQUET #6871

Comments

amorskoy commented Oct 7, 2021

Enviroment

Current Behavior

Expected Behavior

Logs

tuliren commented Oct 15, 2021

tuliren commented Oct 16, 2021 • edited Loading

Phlair commented Oct 18, 2021

tuliren commented Oct 18, 2021

VitaliiMaltsev commented Dec 1, 2021

VitaliiMaltsev commented Dec 7, 2021

tuliren commented Dec 7, 2021

tuliren commented Oct 16, 2021 •

edited

Loading