You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've run into a problem using this target in this scenario:
I'm using this target with settings that call for a temporary table (upsert + dedupe_before_upsert)
I need to run some large extraction tasks that will take well over 24 hours.
Observations:
I was querying the temp table periodically during the run as a way to monitor progress. Shortly after the extraction crossed the 24 hour mark, I noticed that the number of rows in the temporary table had dropped from a couple million records to a couple thousand, and the oldest _sdc_extracted_at in the temporary table was now just after the 24 hour mark. Based on the code linked above I'm assuming that it expired, got deleted, and then seems to have been automatically recreated.
The data extracted in the first 24 hours had gone missing, but no errors were thrown and the extraction continued on.
There are a couple things here that could be opportunities for enhancements:
Can we make the max lifespan of the temporary table (effectively the time limit on the task) configurable? Or better, would it be possible to extend the expiry on the fly if it's getting close to expiry but the job is still in progress?
Can we make it throw an error if a temp table that's still in use gets deleted?
I'm open to trying to contribute towards these changes, but would appreciate getting alignment from a maintainer on the approach first. 🙏
The text was updated successfully, but these errors were encountered:
TrishGillett
changed the title
Overwrite table seem to be destroyed and recreated in the middle of a run when table expiry is reached, resulting in lost data
Temporary table seems to be destroyed and recreated in the middle of a run when table expiry is reached, resulting in lost data
Sep 21, 2024
Mmm I think the best case here is to be able to configure the expiration date and set it to a very high value (one you're sure won't expire) and maybe also a param to ensure deletion after completion of the temp table (when all the sinks are drained)? Would this work?
Hey @AlejandroUPC! I think that could be part of the answer, although personally I would also love to see runs fail loudly in the case where the table disappears mid-run. That would be reassuring for me since I could set the limit to something that I think should be long enough (as opposed to something absurdly long) and trust that I'll be notified if it turns out to be too short. It would also be useful to other users since they'd be informed if they're encountering this issue and need to use the (as yet hypothetical :P) custom time limit setting.
I'm picturing something like, could we make it so the temp table is created before extraction begins, and anytime we intend to write to it we could do an existence check first and fail the run if it doesn't exist? (Apologies if my mental model is off here, I am new to the internals of this target and making some guesses.)
Currently, when a temporary table is created, it is set to expire one day in the future.
I've run into a problem using this target in this scenario:
Observations:
_sdc_extracted_at
in the temporary table was now just after the 24 hour mark. Based on the code linked above I'm assuming that it expired, got deleted, and then seems to have been automatically recreated.There are a couple things here that could be opportunities for enhancements:
I'm open to trying to contribute towards these changes, but would appreciate getting alignment from a maintainer on the approach first. 🙏
The text was updated successfully, but these errors were encountered: