-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instagram Connector: Airbyte Reset/Upgrade and Connector Failing #5435
Comments
More logs here from the failed run. Weirdly, I don't know if it's unrelated but I got some webhook notifications which seemed to come from the old AIrbyte installation which I've since upgraded. I have since fully reset the configs, upgraded everything and set it back up but with a much more recent start date and it seems to be running OK, however I will update... |
This is still failing after approx 6 hrs, logs here: I think the issue is with the DBT normalization, check out these log entries:
|
As some additional fun, this somehow managed to delete ALL my historic data as part of the process! Fortunately I can restore from backups but it's taken all day, not sure who triages these @keu @yevhenii-ldv but it would be great for that to not happen again! |
Attempted to connect the same source to the BigQuery (denormalized typed struct) destination, this time it didn't fail but is still running 14 hrs later and has no logs for the past 9hrs: |
Hi @jimbeepbeep, sorry to hear that. Our team is looking into the issue, will let you know about the result shortly. |
Thanks @keu, I actually think the previous connection (with basic normalization) was only failing on the DBT job becuase of this error message:
This data type ( I have resorted to creating a new connection without normalization and then writing the SQL in BigQuery to UNION with the backup dataset, which will hopefully get me to a consistent dataset without any gaps. It has been running for an hour and read about 1/3 of the records which is pretty normal, fingers crossed this solution works as an interim! I have had a little exposure to DBT but would like to understand more, so happy to apply some of my resources to help solve this issue, although we would need some guidance on where to start and the best approach. @cgardens I know you're the resident expert, so any pointers would be great! |
OK @keu good news (from my perspective anyway!) - the job without normalization (to the normal BigQuery destination 0.3.12) succeeds! Luckily I'm pretty used to working with JSON as well as nested/repeating fields in BigQuery, so I can work with the raw data and decode it into the same schema as when the sync was functioning properly, then UNION into the backup data and we are back up and running. However it's obviously not good that neither the normalization nor the denormalized destinations seem to work with this connector... as I said I am pretty handy with BigQuery transformations (especially ARRAY/STRUCT and UNNEST syntax) so I'd like to help fix if you can point me in the right direction! I am a DBT beginner but keen to level up. |
@ChristopheDuong is actually the expert on this one. 😄 Is the issue that the type for the field is front in the catalog for this source? Or is the issue that this destination is not handling array types properly? No problem if you're not sure, but that's probably the next diagnostic question for us to figure out. |
It should be using this json extract function (through dbt macros) but i am not sure what's actually happening and the generated SQL ended up using the non array function (or the cast were not done properly) in your use case: Line 127 in 954d7cc
I would need to dig deeper to gain a better view of what is happening here in the meantime, it could help if you could share the catalog.json file that ended up failing... You can follow this guide to access it: |
@ChristopheDuong the field is an object so is calling the correct |
Thanks @ChristopheDuong, this is very cool - awesome! In the failing workspace I have a destination_catalog.json and source_catalog.json, which one do you need and is there a good way of copying them out? I can view them using a cat command but they're too long to copy/paste and they have dodgy line breaks etc. I had a dig around and found the
Aside from the weird capitalization (which doesn't affect execution, but might signify a lack of coffee) I think the issue stems from the fact that the
|
OK I figured out the log export, here are the log files from the failed run @ChristopheDuong: I had to change the I did look in the logs etc. for the failed BigQuery (denormalized typed struct) destination as I was interested in the DBT transformation and compiled SQL, however the FYI the raw sync which I am decoding in BigQuery using SQL seems to have stabilised today (125k records, 4-6hrs) after a few overly long runs yesterday (218-245k records, 8-9hrs). |
@jimbeepbeep can you update to latest version? the normalization should work now. |
OK thanks @marcosmarxm, I have upgraded Airbyte to 0.29.12-alpha but the Instagram connector version (which was previously 0.1.8) was not incremented (see screenshot, I was expecting 0.1.9): My raw export job (which I had to implement as an emergency with manually-coded post-load transforms to align to the historic schema) now fails, whereas it was working great only this morning, oof. Logs follow, I would have through that this was the most robust so this is strange... there's an error in there I haven't seen before : logs-27-0.log However the normalized sync appears to have worked this time, with the denormalized one still running, I can work with this but will have to reconstruct the historic dataset from the different sources which is a pain but workable. Let me know if there's anything I should do/expect - every time I upgrade Airbyte something seems to break on this connector! |
Hi @marcosmarxm / @ChristopheDuong do you know if this connector ever got upgraded? I haven't seen an updated version past 0.1.8 (which I am currently running on 0.29.12-alpha) and the syncs have started to fail again (or take a really long time, meaning that the scheduling gets knocked off and then the monitoring we have in place is impacted). Please let me know, also if I need to be running a more recent version of Airbyte to pick up the changes? Let me know and I can action, thanks. |
@johnlafleur this is the issue related to the Instagram connector which we discussed today. It was apparently fixed but the connector upgrade never appeared. Starting yesterday it has started failing on every configuration I try (No Normalization, Basic Normalization, and Denormalized. I have upgraded everything from the Airbyte version, all connectors and even my VM's hard drive but it's still failing after about 6 hrs (which is normally the time it takes for a full sync). This is the (potential Airbyte Cloud) client who's now testing Fivetran because of reliability issues with this connector, so anything I can do to fix this is critically important! EnvironmentAirbyte version: 0.29.22-alpha LogsBasic Normalization (FAIL)Denormalized (FAIL)No Normalization (PENDING)I have the connection without any normalization running now (I was trying to offset the times so they don't run at the same time), and will post the logs once Attempt 1 is complete. Please help! |
I see multiple exceptions in your logs... One of them is always showing on the source connector side:
And another is tied to the BigQuery destination which may be related to this here:
|
Thankd @ChristopheDuong - on the destination side that error is really strange as the |
Yes, you can safely delete all Once the sync is done, those tables should normally be deleted, but in your case, there might have been exceptions and the cleaning did not manage to clear it out. |
Thanks @ChristopheDuong, I have cleared all the |
Thanks team (@marcosmarxm, @ChristopheDuong), I saw the Instagram Connector 0.1.9 became available and upgraded - whatever you did seems to have fixed my issues and I now have two parallel syncs (one raw and decoded in BigQuery, one base normalized) running linke a dream. I never could get the normalised one to work but not a problem. Closing this issue, thanks. |
Great news!! |
Environment
Current Behavior
So some really strange behaviour happened today (around 2021-08-16 06:52:43 UTC I think) , I connected to my VM running Airbyte via SSH and everything looked fine. Then I refreshed a short while later and it went back to the screen as if it was a fresh Airbyte installation. Luckily I only have one critical job running on and it wasn't running at the time, so I set it back up (pointing to the same destination dataset), upgrading Airbyte and the Instagram connector in the process, and left it to sync.
However it is now failing (logs below) and going into the second job after 10hrs.
Expected Behavior
It normally syncs the full dataset in 5-6hrs.
Logs
Scheduler Logs
scheduler-logs.log
Server Logs
server-logs.log
Latest Fail Logs
logs-425-0.log
Steps to Reproduce
Are you willing to submit a PR?
I wish I had your jedi skills
The text was updated successfully, but these errors were encountered: