Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvalidTimezone exception if DAG's start_date timezone is "+00:00" #16613

Closed
ecerulm opened this issue Jun 23, 2021 · 2 comments
Closed

InvalidTimezone exception if DAG's start_date timezone is "+00:00" #16613

ecerulm opened this issue Jun 23, 2021 · 2 comments
Labels
affected_version:2.0 Issues Reported for 2.0 area:core kind:bug This is a clearly a bug
Milestone

Comments

@ecerulm
Copy link
Contributor

ecerulm commented Jun 23, 2021

Apache Airflow version: 2.0.2

Kubernetes version (if you are using kubernetes) (use kubectl version):

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

What happened:

from airflow.models import DAG
from airflow.serialization.serialized_objects import SerializedDAG
import pendulum
dag_start_date = pendulum.parse("2019-08-01T00:00:00.000+00:00")
dag = DAG(dag_id='simple_dag', start_date=dag_start_date)
serialized_dag = SerializedDAG.to_dict(dag)
serialized_dag['dag']['timezone'] # '+00:00'
dag = SerializedDAG.from_dict(serialized_dag) # raises InvalidTimezone exception

Traceback (most recent call last):
  File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/zoneinfo/reader.py", line 50, in read_for
    file_path = pytzdata.tz_path(timezone)
  File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pytzdata/__init__.py", line 74, in tz_path
    raise TimezoneNotFound('Timezone {} not found at {}'.format(name, filepath))
pytzdata.exceptions.TimezoneNotFound: Timezone +00:00 not found at /Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pytzdata/zoneinfo/+00:00
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3441, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-115-6c89c12cbb11>", line 1, in <module>
    dag = SerializedDAG.from_dict(serialized_dag)
  File "/Users/rubelagu/git/airflow/airflow/serialization/serialized_objects.py", line 795, in from_dict
    return cls.deserialize_dag(serialized_obj['dag'])
  File "/Users/rubelagu/git/airflow/airflow/serialization/serialized_objects.py", line 722, in deserialize_dag
    v = cls._deserialize_timezone(v)
  File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/__init__.py", line 37, in timezone
    tz = _Timezone(name, extended=extended)
  File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/timezone.py", line 40, in __init__
    tz = read(name, extend=extended)
  File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/zoneinfo/__init__.py", line 9, in read
    return Reader(extend=extend).read_for(name)
  File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/zoneinfo/reader.py", line 52, in read_for
    raise InvalidTimezone(timezone)
pendulum.tz.zoneinfo.exceptions.InvalidTimezone: Invalid timezone "+00:00"

What you expected to happen:

The DAG holds a reference to the DAG's start_date.tzinfo and it will serialize as +00:00 (this can be checked with serialized_dag['dag']['timezone'], then when it's time to deserialize that it will try to do pendulum.timezone('+00:00)which raises aInvalidTimezone` exception.

In principle I would expect to be able to provide any datetime as start_date , and +00:00 is common. The serialization/deserialization will be used in normal airflow operation so that any DAG with that kind of start_date will give exceptions.

Probably the dag.timezone should not be serialized at all and it should be reconstructed at deserialization time from start_date.

Refactor this section of airflow/models/dag.py::DAG.init() into a method that can be called from both DAG.__init__ and SerializedDAG.from_dict. That way the problem of serialize/deserialize a pendulum.timezone would be avoided.

How to reproduce it:

Anything else we need to know:

Related issue: #16551
Related PR: #16599

@potiuk
Copy link
Member

potiuk commented Dec 5, 2021

@uranusjr - I believe, looking at the code, that it has been fixed in #17414 ? Am I right?

@uranusjr
Copy link
Member

Yes, I believe so. +00:00 is now always force-converted into Timezone("UTC").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.0 Issues Reported for 2.0 area:core kind:bug This is a clearly a bug
Projects
None yet
7 participants