Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mysterious switch of data type for <_airbyte_emitted_at> field crashes incremental jobs #5959

Closed
Tracked by #6996
jd-sanders opened this issue Sep 9, 2021 · 5 comments · Fixed by #7240
Closed
Tracked by #6996

Comments

@jd-sanders
Copy link

Enviroment

  • Airbyte version: 0.28.2 and 0.29.16
  • OS Version / Instance: AWS EC2
  • Deployment: Docker
  • Source Connector and version: Postgres 0.3.11, but potentially all connectors
  • Destination Connector and version: BigQuery 0.4.0
  • Severity: Medium
  • Step where error happened: Sync job (after attempt to Upgrade Airbyte)

Current Behavior

This problem occurred after we attempted to upgrade Airbyte from 0.28.2 to 0.29.16, found that it wouldn't start, and took it back to 0.28.2.

Previous to the failed upgrade, the field _airbyte_emitted_at was copied to BigQuery (into the _airbyte_raw tables) a TIMESTAMP. After the failed upgrade, it started coming in as a STRING, which is not backwards compatible with the existing tables that were supporting incremental updates.

Expected Behavior

The change from type TIMESTAMP to STRING for the field _airbyte_emitted_at shouldn't have occurred, or if it did, shouldn't have caused incremental jobs to fail. _airbyte_emitted_at is more useful as a TIMESTAMP in general.
logs-786-0.txt

Logs

If applicable, please upload the logs from the failing operation.
For sync jobs, you can download the full logs from the UI by going to the sync attempt page and
clicking the download logs button at the top right of the logs display window.

LOG

2021-09-08 21:58:32 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-08 21:58:32 �[1;31mERROR�[m i.a.i.d.b.BigQueryUtils(waitForQuery):77 - {} - Failed to wait for a query job:Job{job=JobId{project=tfc-warehouse, job=322cd030-b362-4b7c-8fa0-ceb211faf7c7, location=US}, status=JobStatus{state=DONE, error=BigQueryError{reason=invalid, location=null, message=Provided Schema does not match Table tfc-warehouse:tabella_ads_raw._airbyte_raw_daily_fb_advertisers_with_date. Field _airbyte_emitted_at has changed type from TIMESTAMP to STRING}, executionErrors=[BigQueryError{reason=invalid, location=null, message=Provided Schema does not match Table tfc-warehouse:tabella_ads_raw._airbyte_raw_daily_fb_advertisers_with_date. Field _airbyte_emitted_at has changed type from TIMESTAMP to STRING}]}, statistics=CopyStatistics{creationTime=1631138312224, endTime=1631138312263, startTime=1631138312263, numChildJobs=null, parentJobId=null, scriptStatistics=null, reservationUsage=null}, [email protected], etag=3vRiAbImCRKezu0/B1qWbQ==, generatedId=tfc-warehouse:US.322cd030-b362-4b7c-8fa0-ceb211faf7c7, selfLink=https://www.googleapis.com/bigquery/v2/projects/tfc-warehouse/jobs/322cd030-b362-4b7c-8fa0-ceb211faf7c7?location=US, configuration=CopyJobConfiguration{type=COPY, sourceTables=[GenericData{classInfo=[datasetId, projectId, tableId], {datasetId=tabella_ads_raw, projectId=tfc-warehouse, tableId=_airbyte_tmp_xjx_daily_fb_advertisers_with_date}}], destinationTable=GenericData{classInfo=[datasetId, projectId, tableId], {datasetId=tabella_ads_raw, projectId=tfc-warehouse, tableId=_airbyte_raw_daily_fb_advertisers_with_date}}, destinationEncryptionConfiguration=null, createDisposition=CREATE_IF_NEEDED, writeDisposition=WRITE_APPEND, labels=null, jobTimeoutMs=null}}

Steps to Reproduce

  1. Set up any incremental sync with new daily data, with an Airbyte version that writes _airbyte_emitted_at as a TIMESTAMP
  2. Change to an Airbyte version that writes _airbyte_emitted_at as a STRING
  3. Job fails

Are you willing to submit a PR?

@jd-sanders jd-sanders added the type/bug Something isn't working label Sep 9, 2021
@marcosmarxm
Copy link
Member

Thanks for reporting this @jd-sanders. Look that #5614 changes the _airbyte_emitted_at data type from timestamp to string. Rollback to a previous version correct the problem?

@sherifnada
Copy link
Contributor

@etsybaev How would this affect existing users of the bigquery destination? Would they need to reset all their data?

@etsybaev
Copy link
Contributor

etsybaev commented Sep 10, 2021

@sherifnada yes. I guess new sync from scratch would be required.
It that is critical, I will try to see what we can do here

@etsybaev etsybaev self-assigned this Sep 10, 2021
@etsybaev
Copy link
Contributor

etsybaev commented Sep 10, 2021

Created a PR that could fix this #5981
But noticed that integration tests pass if I run them independently or even class by class, but we have some failures when running a whole set.
Since failures are not obvious as it doesn't say what exactly went wrong and it would take pretty much time for investigation - have to put it on hold and switch back when have time.
Unless someone steps in and find the exact reason of conflicts between the tests
./gradlew clean :airbyte-integrations:connectors:destination-bigquery:integrationTest

https://github.com/airbytehq/airbyte/runs/3567861286?check_suite_focus=true

Added explicit tmp files removal. Now it passes on CI, but still failed locally (ex. testIncrementalDedupeSync)
Selection_088

@etsybaev etsybaev removed their assignment Sep 10, 2021
andresbravog added a commit to andresbravog/airbyte that referenced this issue Oct 18, 2021
ChristopheDuong added a commit that referenced this issue Oct 25, 2021
* [ #5959 ][ #2579 ] Add support of partitioned tables by _airbyte_emitted_at field (#7141)

Co-authored-by: Andrés Bravo <[email protected]>
schlattk pushed a commit to schlattk/airbyte that referenced this issue Jan 4, 2022
)

* [ airbytehq#5959 ][ airbytehq#2579 ] Add support of partitioned tables by _airbyte_emitted_at field (airbytehq#7141)

Co-authored-by: Andrés Bravo <[email protected]>
@sherifnada sherifnada moved this to Done in GL Roadmap Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Archived in project
6 participants