You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently doing the upgrade check in Airflow 1.10.15 and one of the topics is to change the import locations from contrib to the specific provider.
While replacing: airflow.contrib.operators.gcs_delete_operator.GoogleCloudStorageDeleteOperator
By: airflow.providers.google.cloud.operators.gcs.GCSDeleteObjectsOperator
An error appeared in the UI: Broken DAG: [...] Either object or prefix should be set. Both are None
Upon further investigation, I found out that while the GoogleCloudStorageDeleteOperator from contrib module had this parameter check (as can be seen here):
assertobjectsisnotNoneorprefixisnotNone
The new GCSDeleteObjectsOperator from Google provider module have the following (as can be seen here):
ifnotobjectsandnotprefix:
raiseValueError("Either object or prefix should be set. Both are None")
As it turns out, these conditions are not equivalent, because a variable prefix containing the value of an empty string won't raise an error on the first case, but will raise it in the second one.
What you think should happen instead
This behavior does not match with the documentation description, since using a prefix as an empty string is perfectly valid in case the user wants to delete all objects within the bucket.
Furthermore, there were no philosophical changes within the API in that timeframe. This code change happened in this commit, where the developer's intent was clearly to remove assertions, not to change the logic behind the validation. In fact, it even relates to a PR for this Airflow JIRA ticket.
How to reproduce
Add a GCSDeleteObjectsOperator with a parameter prefix="" to a DAG.
In my opinion, the error message wasn't very accurate as well, since it just breaks the DAG without pointing out which task is causing the issue. It took me 20 minutes to pinpoint the exact task in my case, since I was dealing with a DAG with a lot of tasks.
Adding the task_id to the error message could improve the developer experience in that case.
Apache Airflow Provider(s)
google
Versions of Apache Airflow Providers
All versions.
Apache Airflow version
2.3.2 (latest released)
Operating System
macOS 12.3.1
Deployment
Composer
Deployment details
No response
What happened
I'm currently doing the upgrade check in Airflow 1.10.15 and one of the topics is to change the import locations from contrib to the specific provider.
While replacing:
airflow.contrib.operators.gcs_delete_operator.GoogleCloudStorageDeleteOperator
By:
airflow.providers.google.cloud.operators.gcs.GCSDeleteObjectsOperator
An error appeared in the UI:
Broken DAG: [...] Either object or prefix should be set. Both are None
Upon further investigation, I found out that while the
GoogleCloudStorageDeleteOperator
from contrib module had this parameter check (as can be seen here):The new
GCSDeleteObjectsOperator
from Google provider module have the following (as can be seen here):As it turns out, these conditions are not equivalent, because a variable
prefix
containing the value of an empty string won't raise an error on the first case, but will raise it in the second one.What you think should happen instead
This behavior does not match with the documentation description, since using a prefix as an empty string is perfectly valid in case the user wants to delete all objects within the bucket.
Furthermore, there were no philosophical changes within the API in that timeframe. This code change happened in this commit, where the developer's intent was clearly to remove assertions, not to change the logic behind the validation. In fact, it even relates to a PR for this Airflow JIRA ticket.
How to reproduce
Add a
GCSDeleteObjectsOperator
with a parameterprefix=""
to a DAG.Example:
Anything else
In my opinion, the error message wasn't very accurate as well, since it just breaks the DAG without pointing out which task is causing the issue. It took me 20 minutes to pinpoint the exact task in my case, since I was dealing with a DAG with a lot of tasks.
Adding the
task_id
to the error message could improve the developer experience in that case.Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: