Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCSDeleteObjectsOperator raises unexpected ValueError for prefix set as empty string #24352

Closed
2 tasks done
Lavedonio opened this issue Jun 9, 2022 · 1 comment · Fixed by #24353
Closed
2 tasks done

Comments

@Lavedonio
Copy link
Contributor

Lavedonio commented Jun 9, 2022

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

All versions.

apache-airflow-providers-google>=1.0.0b1
apache-airflow-backport-providers-google>=2020.5.20rc1

Apache Airflow version

2.3.2 (latest released)

Operating System

macOS 12.3.1

Deployment

Composer

Deployment details

No response

What happened

I'm currently doing the upgrade check in Airflow 1.10.15 and one of the topics is to change the import locations from contrib to the specific provider.

While replacing:
airflow.contrib.operators.gcs_delete_operator.GoogleCloudStorageDeleteOperator
By:
airflow.providers.google.cloud.operators.gcs.GCSDeleteObjectsOperator

An error appeared in the UI: Broken DAG: [...] Either object or prefix should be set. Both are None


Upon further investigation, I found out that while the GoogleCloudStorageDeleteOperator from contrib module had this parameter check (as can be seen here):

assert objects is not None or prefix is not None

The new GCSDeleteObjectsOperator from Google provider module have the following (as can be seen here):

if not objects and not prefix:
    raise ValueError("Either object or prefix should be set. Both are None")

As it turns out, these conditions are not equivalent, because a variable prefix containing the value of an empty string won't raise an error on the first case, but will raise it in the second one.

What you think should happen instead

This behavior does not match with the documentation description, since using a prefix as an empty string is perfectly valid in case the user wants to delete all objects within the bucket.

Furthermore, there were no philosophical changes within the API in that timeframe. This code change happened in this commit, where the developer's intent was clearly to remove assertions, not to change the logic behind the validation. In fact, it even relates to a PR for this Airflow JIRA ticket.

How to reproduce

Add a GCSDeleteObjectsOperator with a parameter prefix="" to a DAG.

Example:

from datetime import datetime, timedelta

from airflow import DAG
from airflow.providers.google.cloud.operators.gcs import GCSDeleteObjectsOperator

with DAG('test_dag', schedule_interval=timedelta(days=1), start_date=datetime(2022, 1, 1)) as dag:
    task = GCSDeleteObjectsOperator(
        task_id='task_that_generates_ValueError',
        bucket_name='some_bucket',
        prefix=''
    )

Anything else

In my opinion, the error message wasn't very accurate as well, since it just breaks the DAG without pointing out which task is causing the issue. It took me 20 minutes to pinpoint the exact task in my case, since I was dealing with a DAG with a lot of tasks.

Adding the task_id to the error message could improve the developer experience in that case.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Lavedonio Lavedonio added area:providers kind:bug This is a clearly a bug labels Jun 9, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Jun 9, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants