Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync failure between Shopify and BigQuery (Denormalized) #8420

Closed
jgrinaveckis opened this issue Dec 2, 2021 · 7 comments · Fixed by #8788
Closed

Sync failure between Shopify and BigQuery (Denormalized) #8420

jgrinaveckis opened this issue Dec 2, 2021 · 7 comments · Fixed by #8788

Comments

@jgrinaveckis
Copy link

jgrinaveckis commented Dec 2, 2021

Enviroment

  • Airbyte version: example is 0.32.8-alpha
  • OS Version / Instance: macOS
  • Deployment: Docker
  • Source Connector and version: Shopify 0.1.24
  • Destination Connector and version: BigQuery (denormalized typed struct) 0.1.10
  • Severity: Very Low / Low / Medium / High / Critical
  • Step where error happened: Deploy / Sync job / Setup new connection / Update connector / Upgrade Airbyte

Current Behavior

Trying to sync data from Shopify to BigQuery (denormalized tables) and it fails right as it should start writing data to destination (actually failure occurs while reading data).
Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.

Expected Behavior

Data should sync.

Logs

LOG

2021-12-02 07:13:12 INFO () DefaultReplicationWorker(run):139 - Waiting for destination thread to join.
�[35mdestination�[0m - 2021-12-02 07:13:12 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - 2021-12-02 07:13:12 �[32mINFO�[m i.a.i.b.FailureTrackingAirbyteMessageConsumer(close):60 - {} - Airbyte message consumer: succeeded.
�[35mdestination�[0m - 2021-12-02 07:13:12 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - 2021-12-02 07:13:12 �[32mINFO�[m i.a.i.d.b.BigQueryRecordConsumer(close):163 - {} - Started closing all connections
�[35mdestination�[0m - 2021-12-02 07:13:17 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - 2021-12-02 07:13:17 �[32mINFO�[m i.a.i.d.b.BigQueryRecordConsumer(closeNormalBigqueryStreams):278 - {} - Waiting for jobs to be finished/closed
�[35mdestination�[0m - 2021-12-02 07:13:41 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - 2021-12-02 07:13:41 �[1;31mERROR�[m i.a.i.d.b.BigQueryRecordConsumer(lambda$closeNormalBigqueryStreams$6):284 - {} - Failed to process a message for job: Job{job=JobId{project=kilo-dw, job=bd57a5f9-325b-4691-bba4-87363db51b9c, location=US}, status=JobStatus{state=RUNNING, error=null, executionErrors=null}, statistics=LoadStatistics{creationTime=1638429194290, endTime=null, startTime=1638429194505, numChildJobs=null, parentJobId=null, scriptStatistics=null, reservationUsage=null, inputBytes=null, inputFiles=null, outputBytes=null, outputRows=null, badRecords=null}, [email protected], etag=uK1/MvPmi+wO13K1k2qaTA==, generatedId=kilo-dw:US.bd57a5f9-325b-4691-bba4-87363db51b9c, selfLink=https://www.googleapis.com/bigquery/v2/projects/kilo-dw/jobs/bd57a5f9-325b-4691-bba4-87363db51b9c?location=US, configuration=LoadJobConfiguration{type=LOAD, destinationTable=GenericData{classInfo=[datasetId, projectId, tableId], {datasetId=keto_shopify_denorm, projectId=kilo-dw, tableId=_airbyte_tmp_exw_orders}}, decimalTargetTypes=null, destinationEncryptionConfiguration=null, createDisposition=CREATE_IF_NEEDED, writeDisposition=null, formatOptions=FormatOptions{format=NEWLINE_DELIMITED_JSON}, nullMarker=null, maxBadRecords=null, schema=Schema{fields=[Field{name=id, type=INTEGER, mode=NULLABLE, description=null, policyTags=null}, Field{name=name, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=note, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=tags, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=test, type=BOOLEAN, mode=NULLABLE, description=null, policyTags=null}, Field{name=email, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=phone, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=token, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=app_id, type=INTEGER, mode=NULLABLE, description=null, policyTags=null}, Field{name=number, type=INTEGER, mode=NULLABLE, description=null, policyTags=null}, Field{name=gateway, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=refunds, type=RECORD, mode=null, description=null, policyTags=null}, Field{name=user_id, type=FLOAT, mode=NULLABLE, description=null, policyTags=null}, Field{name=currency, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=customer, type=RECORD, mode=NULLABLE, description=null, policyTags=null}, Field{name=closed_at, type=DATETIME, mode=NULLABLE, description=null, policyTags=null}, Field{name=confirmed, type=BOOLEAN, mode=NULLABLE, description=null, policyTags=null}, Field{name=device_id, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=reference, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=tax_lines, type=RECORD, mode=null, description=null, policyTags=null}, Field{name=total_tax, type=FLOAT, mode=NULLABLE, description=null, policyTags=null}, Field{name=browser_ip, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=cart_token, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=created_at, type=DATETIME, mode=NULLABLE, description=null, policyTags=null}, Field{name=line_items, type=RECORD, mode=null, description=null, policyTags=null}, Field{name=source_url, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=updated_at, type=DATETIME, mode=NULLABLE, description=null, policyTags=null}, Field{name=checkout_id, type=INTEGER, mode=NULLABLE, description=null, policyTags=null}, Field{name=location_id, type=INTEGER, mode=NULLABLE, description=null, policyTags=null}, Field{name=source_name, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=total_price, type=FLOAT, mode=NULLABLE, description=null, policyTags=null}, Field{name=cancelled_at, type=DATETIME, mode=NULLABLE, description=null, policyTags=null}, Field{name=fulfillments, type=RECORD, mode=null, description=null, policyTags=null}, Field{name=landing_site, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=order_number, type=INTEGER, mode=NULLABLE, description=null, policyTags=null}, Field{name=processed_at, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=total_weight, type=INTEGER, mode=NULLABLE, description=null, policyTags=null}, Field{name=cancel_reason, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=contact_email, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=total_tax_set, type=RECORD, mode=NULLABLE, description=null, policyTags=null}, Field{name=checkout_token, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=client_details, type=RECORD, mode=NULLABLE, description=null, policyTags=null}, Field{name=discount_codes, type=RECORD, mode=null, description=null, policyTags=null}, Field{name=referring_site, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=shipping_lines, type=RECORD, mode=null, description=null, policyTags=null}, Field{name=subtotal_price, type=FLOAT, mode=NULLABLE, description=null, policyTags=null}, Field{name=taxes_included, type=BOOLEAN, mode=NULLABLE, description=null, policyTags=null}, Field{name=billing_address, type=RECORD, mode=NULLABLE, description=null, policyTags=null}, Field{name=customer_locale, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=note_attributes, type=RECORD, mode=null, description=null, policyTags=null}, Field{name=payment_details, type=RECORD, mode=NULLABLE, description=null, policyTags=null}, Field{name=total_discounts, type=FLOAT, mode=NULLABLE, description=null, policyTags=null}, Field{name=total_price_set, type=RECORD, mode=NULLABLE, description=null, policyTags=null}, Field{name=total_price_usd, type=FLOAT, mode=NULLABLE, description=null, policyTags=null}, Field{name=financial_status, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=landing_site_ref, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=order_status_url, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=shipping_address, type=RECORD, mode=NULLABLE, description=null, policyTags=null}, Field{name=current_total_tax, type=FLOAT, mode=NULLABLE, description=null, policyTags=null}, Field{name=processing_method, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=source_identifier, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=total_outstanding, type=FLOAT, mode=NULLABLE, description=null, policyTags=null}, Field{name=fulfillment_status, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=subtotal_price_set, type=RECORD, mode=NULLABLE, description=null, policyTags=null}, Field{name=total_tip_received, type=FLOAT, mode=NULLABLE, description=null, policyTags=null}, Field{name=current_total_price, type=FLOAT, mode=NULLABLE, description=null, policyTags=null}, Field{name=total_discounts_set, type=RECORD, mode=NULLABLE, description=null, policyTags=null}, Field{name=admin_graphql_api_id, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=discount_allocations, type=RECORD, mode=null, description=null, policyTags=null}, Field{name=presentment_currency, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=current_total_tax_set, type=RECORD, mode=NULLABLE, description=null, policyTags=null}, Field{name=payment_gateway_names, type=RECORD, mode=null, description=null, policyTags=null}, Field{name=current_subtotal_price, type=FLOAT, mode=NULLABLE, description=null, policyTags=null}, Field{name=total_line_items_price, type=FLOAT, mode=NULLABLE, description=null, policyTags=null}, Field{name=buyer_accepts_marketing, type=BOOLEAN, mode=NULLABLE, description=null, policyTags=null}, Field{name=current_total_discounts, type=FLOAT, mode=NULLABLE, description=null, policyTags=null}, Field{name=current_total_price_set, type=RECORD, mode=NULLABLE, description=null, policyTags=null}, Field{name=current_total_duties_set, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=total_shipping_price_set, type=RECORD, mode=NULLABLE, description=null, policyTags=null}, Field{name=original_total_duties_set, type=STRING, mode=NULLABLE, description=null, policyTags=null}, Field{name=current_subtotal_price_set, type=RECORD, mode=NULLABLE, description=null, policyTags=null}, Field{name=total_line_items_price_set, type=RECORD, mode=NULLABLE, description=null, policyTags=null}, Field{name=current_total_discounts_set, type=RECORD, mode=NULLABLE, description=null, policyTags=null}, Field{name=_airbyte_ab_id, type=STRING, mode=null, description=null, policyTags=null}, Field{name=_airbyte_emitted_at, type=TIMESTAMP, mode=null, description=null, policyTags=null}]}, ignoreUnknownValue=null, sourceUris=null, schemaUpdateOptions=null, autodetect=null, timePartitioning=null, clustering=null, useAvroLogicalTypes=null, labels=null, jobTimeoutMs=null, rangePartitioning=null, hivePartitioningOptions=null}}, 
�[35mdestination�[0m - 2021-12-02 07:13:41 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - Streams numbers: 19, 
�[35mdestination�[0m - 2021-12-02 07:13:41 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - SyncMode: WRITE_TRUNCATE, 
�[35mdestination�[0m - 2021-12-02 07:13:41 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - TableName: GenericData{classInfo=[datasetId, projectId, tableId], {datasetId=keto_shopify_denorm, tableId=orders}}, 
�[35mdestination�[0m - 2021-12-02 07:13:41 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - TmpTableName: GenericData{classInfo=[datasetId, projectId, tableId], {datasetId=keto_shopify_denorm, tableId=_airbyte_tmp_exw_orders}}
�[35mdestination�[0m - 2021-12-02 07:13:41 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - 2021-12-02 07:13:41 �[32mINFO�[m i.a.i.d.b.BigQueryRecordConsumer(printHeapMemoryConsumption):431 - {} - Initial Memory (xms) mb = 62
�[35mdestination�[0m - 2021-12-02 07:13:41 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - 2021-12-02 07:13:41 �[32mINFO�[m i.a.i.d.b.BigQueryRecordConsumer(printHeapMemoryConsumption):432 - {} - Max Memory (xmx) : mb + 2952
�[35mdestination�[0m - 2021-12-02 07:13:41 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - 2021-12-02 07:13:41 �[32mINFO�[m i.a.i.d.b.BigQueryRecordConsumer(closeNormalBigqueryStreams):312 - {} - Removing tmp tables...
�[35mdestination�[0m - 2021-12-02 07:13:46 INFO () DefaultAirbyteStreamFactory(lambda$create$0):61 - 2021-12-02 07:13:46 �[32mINFO�[m i.a.i.d.b.BigQueryRecordConsumer(closeNormalBigqueryStreams):314 - {} - Finishing destination process...completed
�[35mdestination�[0m - 2021-12-02 07:13:46 ERROR () LineGobbler(voidCall):82 - Exception in thread "main" java.lang.RuntimeException: com.google.cloud.bigquery.BigQueryException: Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.
�[35mdestination�[0m - 2021-12-02 07:13:46 ERROR () LineGobbler(voidCall):82 - 	at io.airbyte.integrations.destination.bigquery.BigQueryRecordConsumer.lambda$closeNormalBigqueryStreams$6(BigQueryRecordConsumer.java:289)
�[35mdestination�[0m - 2021-12-02 07:13:46 ERROR () LineGobbler(voidCall):82 - 	at io.airbyte.commons.lang.Exceptions.castCheckedToRuntime(Exceptions.java:52)
�[35mdestination�[0m - 2021-12-02 07:13:46 ERROR () LineGobbler(voidCall):82 - 	at io.airbyte.commons.lang.Exceptions.toRuntime(Exceptions.java:39)
�[35mdestination�[0m - 2021-12-02 07:13:46 ERROR () LineGobbler(voidCall):82 - 	at io.airbyte.integrations.destination.bigquery.BigQueryRecordConsumer.lambda$closeNormalBigqueryStreams$7(BigQueryRecordConsumer.java:279)
�[35mdestination�[0m - 2021-12-02 07:13:46 ERROR () LineGobbler(voidCall):82 - 	at java.base/java.util.HashMap$Values.forEach(HashMap.java:1068)
�[35mdestination�[0m - 2021-12-02 07:13:46 ERROR () LineGobbler(voidCall):82 - 	at io.airbyte.integrations.destination.bigquery.BigQueryRecordConsumer.closeNormalBigqueryStreams(BigQueryRecordConsumer.java:279)
�[35mdestination�[0m - 2021-12-02 07:13:46 ERROR () LineGobbler(voidCall):82 - 	at io.airbyte.integrations.destination.bigquery.BigQueryRecordConsumer.close(BigQueryRecordConsumer.java:169)
�[35mdestination�[0m - 2021-12-02 07:13:46 ERROR () LineGobbler(voidCall):82 - 	at io.airbyte.integrations.destination.bigquery.BigQueryDenormalizedRecordConsumer.close(BigQueryDenormalizedRecordConsumer.java:81)
�[35mdestination�[0m - 2021-12-02 07:13:46 ERROR () LineGobbler(voidCall):82 - 	at io.airbyte.integrations.base.FailureTrackingAirbyteMessageConsumer.close(FailureTrackingAirbyteMessageConsumer.java:62)
�[35mdestination�[0m - 2021-12-02 07:13:46 ERROR () LineGobbler(voidCall):82 - 	at io.airbyte.integrations.base.IntegrationRunner.consumeWriteStream(IntegrationRunner.java:152)
�[35mdestination�[0m - 2021-12-02 07:13:46 ERROR () LineGobbler(voidCall):82 - 	at io.airbyte.integrations.base.IntegrationRunner.run(IntegrationRunner.java:128)
�[35mdestination�[0m - 2021-12-02 07:13:46 ERROR () LineGobbler(voidCall):82 - 	at io.airbyte.integrations.destination.bigquery.BigQueryDenormalizedDestination.main(BigQueryDenormalizedDestination.java:209)
�[35mdestination�[0m - 2021-12-02 07:13:46 ERROR () LineGobbler(voidCall):82 - Caused by: com.google.cloud.bigquery.BigQueryException: Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.
�[35mdestination�[0m - 2021-12-02 07:13:46 ERROR () LineGobbler(voidCall):82 - 	at com.google.cloud.bigquery.Job.reload(Job.java:411)
�[35mdestination�[0m - 2021-12-02 07:13:46 ERROR () LineGobbler(voidCall):82 - 	at com.google.cloud.bigquery.Job.waitFor(Job.java:248)
�[35mdestination�[0m - 2021-12-02 07:13:46 ERROR () LineGobbler(voidCall):82 - 	at io.airbyte.integrations.destination.bigquery.BigQueryRecordConsumer.lambda$closeNormalBigqueryStreams$6(BigQueryRecordConsumer.java:282)
�[35mdestination�[0m - 2021-12-02 07:13:46 ERROR () LineGobbler(voidCall):82 - 	... 11 more
2021-12-02 07:13:46 INFO () DefaultReplicationWorker(run):141 - Destination thread complete.
2021-12-02 07:13:46 ERROR () DefaultReplicationWorker(run):145 - Sync worker failed.
io.airbyte.workers.WorkerException: Destination process exit with code 1. This warning is normal if the job was cancelled.
	at io.airbyte.workers.protocols.airbyte.DefaultAirbyteDestination.close(DefaultAirbyteDestination.java:114) ~[io.airbyte-airbyte-workers-0.32.8-alpha.jar:?]
	at io.airbyte.workers.DefaultReplicationWorker.run(DefaultReplicationWorker.java:143) ~[io.airbyte-airbyte-workers-0.32.8-alpha.jar:?]
	at io.airbyte.workers.DefaultReplicationWorker.run(DefaultReplicationWorker.java:49) ~[io.airbyte-airbyte-workers-0.32.8-alpha.jar:?]
	at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$2(TemporalAttemptExecution.java:167) ~[io.airbyte-airbyte-workers-0.32.8-alpha.jar:?]
	at java.lang.Thread.run(Thread.java:832) [?:?]
2021-12-02 07:13:46 INFO () DefaultReplicationWorker(run):169 - sync summary: io.airbyte.config.ReplicationAttemptSummary@777005af[status=failed,recordsSynced=497,bytesSynced=786934,startTime=1638429031787,endTime=1638429226496]
2021-12-02 07:13:46 INFO () DefaultReplicationWorker(run):178 - Source did not output any state messages
2021-12-02 07:13:46 WARN () DefaultReplicationWorker(run):186 - State capture: No new state, falling back on input state: io.airbyte.config.State@3e21c7a0[state={}]
2021-12-02 07:13:46 INFO () TemporalAttemptExecution(get):137 - Stopping cancellation check scheduling...

Steps to Reproduce

  1. Create Shopify connector
  2. Create BigQuery connector (for denormalized tables)
  3. Start syncing data

Are you willing to submit a PR?

Probably will try to add PR

@jgrinaveckis
Copy link
Author

jgrinaveckis commented Dec 2, 2021

Check logging in bigquery, additional error messages are:
Error while reading data, error message: JSON parsing error in row starting at position 0: Array specified for non-repeated field: line_items.value.properties.
Error while reading data, error message: JSON parsing error in row starting at position 647: Could not convert value to string. Field: value; Value: 0

It seems that properties.value holds a value of 0 which is not parsed to string.

@jgrinaveckis
Copy link
Author

I think schema for orders (not only I guess) needs to be updated - probably will submit a PR

@sherifnada sherifnada added area/connectors Connector related issues and removed needs-triage labels Dec 3, 2021
@sherifnada
Copy link
Contributor

to fully understand the issue we should sync all streams to bq denormalized and debug the problems

@jgrinaveckis
Copy link
Author

I did this before creating this issue and errors are these:
_airbyte_tmp_dex_orders table:
"Error while reading data, error message: JSON parsing error in row starting at position 0: JSON object specified for non-record field: discount_codes.value”

_airbyte_tmp_bla_abandoned_checkouts table:
"Error while reading data, error message: JSON parsing error in row starting at position 0: Array specified for non-repeated field: line_items.value.properties."

_airbyte_tmp_nxy_metafields table
Error while reading data, error message: JSON parsing error in row starting at position 647: Could not convert value to string. Field: value; Value: 0

@sherifnada
Copy link
Contributor

@jgrinaveckis sorry this was a comment for the eng working on it 😅 didn't mean to say you should be doing this. But thanks for sharing the logs! this is helpful

@bazarnov
Copy link
Collaborator

bazarnov commented Dec 10, 2021

There were several issues inside of draft_orders, orders, products, abandoned_checkouts streams, those are fixed in this PR.

@bazarnov
Copy link
Collaborator

In order to fix the rest of the issues related to the destination-bigquery-denormalized we need this to be merged as well:
#8665

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Archived in project
6 participants