This repository has been archived by the owner on Aug 4, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add configuration options to skip ingestion errors #650
Add configuration options to skip ingestion errors #650
Changes from 3 commits
8f709bd
20df903
da52cc2
452e67b
8f86fbf
50a85d5
f13eff0
4edaa73
fa0df8b
f8de955
638b1d4
dda3d78
0b3ffc6
116016a
85ecedf
7719b97
0809ffb
410e8ab
5cb47ef
73a7583
ec08221
1914578
84367f0
b44f2b2
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the changes in #645, I think we'll need to be careful about bare
Exception
catches. It looks like ifskip_ingestion_errors
isTrue
, then execution will continue. This breaks the case where the task is stopped by Airflow. I think we can catch that specific case by handlingAirflowException
differently and immediately re-raising that though!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extremely good point. I think we'll also want to make sure we raise any ingestion errors that have previously been skipped before we encounter this.
What do you think about making
skip_ingestion_errors
a list of Exception types to skip, rather than a simple boolean?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it! Only trouble would be matching on the exceptions since we have to go from a string to an exception type 😅 I guess we could just do
if exception.__type__.__name__ in skip_ingestion_errors
? I think as long as we're handling theAirflowException
case (and that's the right case for Airflow stuff in general) then we could add that down the line 🙂There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking on this again, should this maybe be a new exception type, like
AggregateIngestionError
?AirflowException
doesn't seem to be the right one to use 🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this exception will also get sent to Slack, maybe it would be ideal to log the more verbose errors and then raise a single exception at the end that says something along the lines of
{len(self.ingestion_errors)} ingestion errors occurred while running
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like
AggregateIngestionError
(I had considered making a custom error before and couldn't come up with a name 😅 ). Totally agreed on the error logging -- I was under the impression the Slack messages would truncate but it doesn't look like that's the case, thanks for testing!Updated to only log the verbose errors, while throwing an
AggregateIngestionError
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was attempting to avoid renaming the current
get_next_query_params
method, but I'm not happy with this naming. Any suggestions?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
determine_query_params
?