Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination BigQuery Desnormalized: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details #6638

Closed
Tracked by #6996
marcosmarxm opened this issue Oct 1, 2021 · 17 comments · Fixed by #7413 or #7736

Comments

@marcosmarxm
Copy link
Member

Enviroment

  • Airbyte version: 0.29.30-alpha
  • OS Version / Instance: macOS
  • Deployment: Docker
  • Source Connector and version: Shopify 0.1.18
  • Destination Connector and version: BigQuery Desnormalizaed 0.1.6
  • Severity: Very Low / Low / Medium / High / Critical
  • Step where error happened: Deploy / Sync job / Setup new connection / Update connector / Upgrade Airbyte

Current Behavior

Tell us what happens.

Expected Behavior

Tell us what should happen.

Logs

logs-11-0.txt

Steps to Reproduce

  1. create shopify using integration account
  2. create bigquery desnormalized with integration account
  3. run sync 💣

Are you willing to submit a PR?

Remove this with your answer.

@marcosmarxm marcosmarxm added type/bug Something isn't working area/connectors Connector related issues labels Oct 1, 2021
@Nmaxime
Copy link
Contributor

Nmaxime commented Oct 4, 2021

I got a similar issue using a facebook pages source.

The error: ERROR () LineGobbler(voidCall):65 - Exception in thread "main" java.lang.RuntimeException: com.google.cloud.bigquery.BigQueryException: Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.

  • Airbyte version: v0.30.9-alpha
  • OS Version / Instance: Ubuntu-20.04
  • Deployment: Docker
  • Source Connector and version: Facebook pages 0.1.1
  • Destination Connector and version: BigQuery Desnormalized 0.1.6

Sync logs: logs-139-0.txt

@matthieuUrsa
Copy link

Airbyte version: v0.30.9-alpha
OS Version / Instance: Stable channel -- | -- 1.19.13-gke.701
Deployment: Kubernetes cluster
Source Connector and version: Shopify 0.1.18
Destination Connector and version: BigQuery Desnormalized 0.1.6

Hi guys, as I mentioned in the slack troubleshooting channel,

https://airbytehq.slack.com/archives/C01MFR03D5W/p1633085634063100

I got this problem to on my GCP Compute engine instance with 0.30.2 version of Airbyte.

I mount a Kubernetes version of Airbytes on GKP for my business purpose and I'm now on 0.30.9 and I got the same problem but different this time.

So now on the Airbyte interface my sync is successful... but...

image

Sync logs: logs-2-0.txt

But nothing in my dataset, not a single table. When I read the bigquery log, the same problem.

image

The problem is quite simple, when the shema is created, all DATES have STRING key.

image

But in order to work, in the latest version of BigQuery (denormalized typed struct), version 0.1.6, all DATES needs to have FORMAT key

image

@cptjacky
Copy link
Contributor

cptjacky commented Oct 6, 2021

It looks like BigQuery only understand one specific format for the DATETIME type (YYYY-MM-DD HH:MM:SS[.SSSSSS], see https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-json#details_of_loading_json_data), which doesn't fully match the date-time definition in the json spec (https://json-schema.org/understanding-json-schema/reference/string.html#dates-and-times).

BigQuery's TIMESTAMP type seems more flexible, should we map the date-time format to bigquery's TIMESTAMP field instead of DATETIME?

The UI displays string but when you look at the log, we generated a datetime type in bigquery, meaning the used source defines format

@matthieuUrsa
Copy link

Yes sure ! We used another platform to transfer our data before and they save all the Shopify dates in timestamp in bigquery. so it make a lot of sense !

@Boopathy-Raja
Copy link

Hi,everyone. Encountered the same problem.

It would be great, if the airbyte platform can provide feature to choose the destination datatype by our own.

@etsybaev
Copy link
Contributor

May be also somehow related to #5959

@sherifnada
Copy link
Contributor

Hi everyone, could you upgrade to the latest version of the connector and let us know if you're still seeing issues?

@matthieuUrsa
Copy link

matthieuUrsa commented Nov 3, 2021 via email

@sherifnada
Copy link
Contributor

nope just the connector

@matthieuUrsa
Copy link

matthieuUrsa commented Nov 4, 2021 via email

@sherifnada
Copy link
Contributor

@matthieuUrsa can you share the logs from the sync?

@marcosmarxm marcosmarxm reopened this Nov 5, 2021
@marcosmarxm
Copy link
Member Author

I continue facing the issue Facebook Marketing 0.2.23 to BigQuery Denormalized 0.1.8
logs-28-0 (1).txt

I'm using integration account creds.

@alexandertsukanov
Copy link
Contributor

May be it due to different type of source-connector (and possible different DATETIME format). Let me check.

@matthieuUrsa
Copy link

matthieuUrsa commented Nov 5, 2021 via email

@alexandertsukanov
Copy link
Contributor

alexandertsukanov commented Nov 5, 2021

I did the investigation, I have used:

  • airbyte/source-facebook-marketing 0.2.21(Integration account as the source)
  • airbyte/destination-bigquery-denormalized 0.1.8

During investigation was found all DateTime was correctly parsed to BIGQUERY DATETIME format (this was root cause of previous issue):

Screenshot 2021-11-05 at 13 57 41

The problem occurs during the processing of the ObjectNode with key "targeting" in my case (I didn't select all tables):

Screenshot 2021-11-05 at 14 01 41

which has a null value in Object Type at UI column in my case.

Screenshot 2021-11-05 at 14 04 00

Moreover, My assumption we will have such errors when we gonna process such ObjectNode types for Bigquery-denormalized. So, I think the second part of this bug will be the fixing of correct ObjectNode mapping for BigQuery DB.
Let me know if we have such related bugs.

CC: @sherifnada , @marcosmarxm

@sherifnada
Copy link
Contributor

The targeting object seems to declare its type properly. However there is a related issue which might be relevant: #7725

@alexandertsukanov I think however, the destination needs to be more resilient to situations like these. If the input schema is not known, then we should drop this column entirely rather than fail the entire sync.

I exported the catalog for the facebook source using python main.py discover --config secrets/config.json with integration test credentials. Let's zoom in on one of the problematic streams ad_sets.

Zooming in on the problematic field targeting, we see the following schema (removed some fields to make it clearer)

{
  "name": "ad_sets",
  "json_schema": {
    "type": [
      "null",
      "object"
    ],
    "properties": {
      "targeting": {
        "$ref": "#/definitions/targeting_"
      },
    },
    "definitions": {
      "targeting_id_name_pairs": {
        "items": {
          "type": [
            "null",
            "object"
          ],
          "properties": {
            "name": {
              "type": [
                "null",
                "string"
              ]
            },
            "id": {
              "type": [
                "null",
                "string"
              ]
            }
          }
        },
        "type": [
          "null",
          "array"
        ]
      },
      "targeting_targeting_fields": {
        "type": [
          "null",
          "object"
        ],
        "properties": {
          "household_composition": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "home_type": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "friends_of_connections": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "family_statuses": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "work_positions": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "education_majors": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "life_events": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "moms": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "custom_audiences": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "interests": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "behaviors": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "connections": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "excluded_connections": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "user_adclusters": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "income": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "excluded_custom_audiences": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "work_employers": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "industries": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "net_worth": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "relationship_statuses": {
            "type": [
              "null",
              "array"
            ],
            "items": {
              "type": [
                "null",
                "integer"
              ]
            }
          },
          "generation": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "home_ownership": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "politics": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          }
        }
      },
      "geo_locations_": {
        "type": [
          "null",
          "object"
        ],
        "properties": {
          "regions": {
            "items": {
              "type": [
                "null",
                "object"
              ],
              "properties": {
                "name": {
                  "type": [
                    "null",
                    "string"
                  ]
                },
                "country": {
                  "type": [
                    "null",
                    "string"
                  ]
                },
                "key": {
                  "type": [
                    "null",
                    "string"
                  ]
                }
              }
            },
            "type": [
              "null",
              "array"
            ]
          },
          "countries": {
            "items": {
              "type": [
                "null",
                "string"
              ]
            },
            "type": [
              "null",
              "array"
            ]
          },
          "cities": {
            "items": {
              "type": [
                "null",
                "object"
              ],
              "properties": {
                "key": {
                  "type": [
                    "null",
                    "string"
                  ]
                },
                "distance_unit": {
                  "type": [
                    "null",
                    "string"
                  ]
                },
                "region": {
                  "type": [
                    "null",
                    "string"
                  ]
                },
                "name": {
                  "type": [
                    "null",
                    "string"
                  ]
                },
                "country": {
                  "type": [
                    "null",
                    "string"
                  ]
                },
                "region_id": {
                  "type": [
                    "null",
                    "string"
                  ]
                },
                "radius": {
                  "type": [
                    "null",
                    "integer"
                  ]
                }
              }
            },
            "type": [
              "null",
              "array"
            ]
          },
          "location_types": {
            "items": {
              "type": [
                "null",
                "string"
              ]
            },
            "type": [
              "null",
              "array"
            ]
          },
          "zips": {
            "type": [
              "null",
              "array"
            ],
            "items": {
              "type": [
                "null",
                "object"
              ],
              "properties": {
                "country": {
                  "type": [
                    "null",
                    "string"
                  ]
                },
                "key": {
                  "type": [
                    "null",
                    "string"
                  ]
                },
                "name": {
                  "type": [
                    "null",
                    "string"
                  ]
                },
                "primary_city_id": {
                  "type": [
                    "null",
                    "integer"
                  ]
                },
                "region_id": {
                  "type": [
                    "null",
                    "integer"
                  ]
                }
              }
            }
          },
          "custom_locations": {
            "type": [
              "null",
              "array"
            ],
            "items": {
              "type": [
                "null",
                "object"
              ],
              "properties": {
                "address_string": {
                  "type": [
                    "null",
                    "string"
                  ]
                },
                "country": {
                  "type": [
                    "null",
                    "string"
                  ]
                },
                "distance_unit": {
                  "type": [
                    "null",
                    "string"
                  ]
                },
                "latitude": {
                  "type": [
                    "null",
                    "number"
                  ]
                },
                "longitude": {
                  "type": [
                    "null",
                    "number"
                  ]
                },
                "name": {
                  "type": [
                    "null",
                    "string"
                  ]
                },
                "primary_city_id": {
                  "type": [
                    "null",
                    "integer"
                  ]
                },
                "radius": {
                  "type": [
                    "null",
                    "integer"
                  ]
                },
                "region_id": {
                  "type": [
                    "null",
                    "integer"
                  ]
                }
              }
            }
          },
          "country_groups": {
            "type": [
              "null",
              "array"
            ],
            "items": {
              "type": [
                "null",
                "string"
              ]
            }
          },
          "geo-markets": {
            "type": [
              "null",
              "array"
            ],
            "items": {
              "type": [
                "null",
                "object"
              ],
              "properties": {
                "key": {
                  "type": [
                    "null",
                    "string"
                  ]
                },
                "name": {
                  "type": [
                    "null",
                    "string"
                  ]
                }
              }
            }
          }
        }
      },
      "targeting_": {
        "type": [
          "null",
          "object"
        ],
        "properties": {
          "messenger_positions": {
            "type": [
              "null",
              "array"
            ],
            "items": {
              "type": [
                "null",
                "string"
              ]
            }
          },
          "locales": {
            "items": {
              "type": [
                "null",
                "integer"
              ]
            },
            "type": [
              "null",
              "array"
            ]
          },
          "connections": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "instagram_positions": {
            "items": {
              "type": [
                "null",
                "string"
              ]
            },
            "type": [
              "null",
              "array"
            ]
          },
          "audience_network_positions": {
            "items": {
              "type": [
                "null",
                "string"
              ]
            },
            "type": [
              "null",
              "array"
            ]
          },
          "exclusions": {
            "$ref": "#/definitions/targeting_targeting_fields"
          },
          "interests": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "geo_locations": {
            "$ref": "#/definitions/geo_locations_"
          },
          "excluded_geo_locations": {
            "$ref": "#/definitions/geo_locations_"
          },
          "work_positions": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "education_statuses": {
            "items": {
              "type": [
                "null",
                "integer"
              ]
            },
            "type": [
              "null",
              "array"
            ]
          },
          "publisher_platforms": {
            "items": {
              "type": [
                "null",
                "string"
              ]
            },
            "type": [
              "null",
              "array"
            ]
          },
          "age_max": {
            "type": [
              "null",
              "integer"
            ]
          },
          "custom_audiences": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "device_platforms": {
            "items": {
              "type": [
                "null",
                "string"
              ]
            },
            "type": [
              "null",
              "array"
            ]
          },
          "age_min": {
            "type": [
              "null",
              "integer"
            ]
          },
          "facebook_positions": {
            "items": {
              "type": [
                "null",
                "string"
              ]
            },
            "type": [
              "null",
              "array"
            ]
          },
          "excluded_connections": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "flexible_spec": {
            "items": {
              "$ref": "#/definitions/targeting_targeting_fields"
            },
            "type": [
              "null",
              "array"
            ]
          },
          "friends_of_connections": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "user_os": {
            "type": [
              "null",
              "array"
            ],
            "items": {
              "type": [
                "null",
                "string"
              ]
            }
          },
          "user_device": {
            "type": [
              "null",
              "array"
            ],
            "items": {
              "type": [
                "null",
                "string"
              ]
            }
          },
          "excluded_custom_audiences": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "excluded_publisher_categories": {
            "type": [
              "null",
              "array"
            ],
            "items": {
              "type": [
                "null",
                "string"
              ]
            }
          },
          "genders": {
            "type": [
              "null",
              "array"
            ],
            "items": {
              "type": [
                "null",
                "integer"
              ]
            }
          },
          "targeting_optimization": {
            "type": [
              "null",
              "string"
            ]
          },
          "user_adclusters": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "generation": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "behaviors": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "moms": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "home_ownership": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "office_type": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "home_type": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "family_statuses": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "relationship_statuses": {
            "type": [
              "null",
              "array"
            ],
            "items": {
              "type": [
                "null",
                "integer"
              ]
            }
          },
          "industries": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "income": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "app_install_state": {
            "type": [
              "null",
              "string"
            ]
          },
          "net_worth": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "politics": {
            "$ref": "#/definitions/targeting_id_name_pairs"
          },
          "interested_in": {
            "type": [
              "null",
              "array"
            ],
            "items": {
              "type": [
                "null",
                "integer"
              ]
            }
          },
          "excluded_user_device": {
            "type": [
              "null",
              "array"
            ],
            "items": {
              "type": [
                "null",
                "string"
              ]
            }
          }
        }
      }
    }
  }
}

This kind of ref is not suppoted in Airbyte and therefore any connector which encounters it should either drop the column entirely or write it as a string (writing as a string should generally be an acceptable escape hatch).

Could we implement the following fix in the BQ destination? the ticket linked above tracks the fix on the FB connector side

@alexandertsukanov
Copy link
Contributor

alexandertsukanov commented Nov 8, 2021

@sherifnada I think the second option is more preferable. I am going to implement the fix using this approach. However, it looks like this issue the blocker only for bigquery-denormalized not for regular BQ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment