You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Kubernetes version (if you are using kubernetes) (use kubectl version):
Environment:
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:
What happened:
from airflow.models import DAG
from airflow.serialization.serialized_objects import SerializedDAG
import pendulum
dag_start_date = pendulum.parse("2019-08-01T00:00:00.000+00:00")
dag = DAG(dag_id='simple_dag', start_date=dag_start_date)
serialized_dag = SerializedDAG.to_dict(dag)
serialized_dag['dag']['timezone'] # '+00:00'
dag = SerializedDAG.from_dict(serialized_dag) # raises InvalidTimezone exception
Traceback (most recent call last):
File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/zoneinfo/reader.py", line 50, in read_for
file_path = pytzdata.tz_path(timezone)
File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pytzdata/__init__.py", line 74, in tz_path
raise TimezoneNotFound('Timezone {} not found at {}'.format(name, filepath))
pytzdata.exceptions.TimezoneNotFound: Timezone +00:00 not found at /Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pytzdata/zoneinfo/+00:00
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3441, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-115-6c89c12cbb11>", line 1, in <module>
dag = SerializedDAG.from_dict(serialized_dag)
File "/Users/rubelagu/git/airflow/airflow/serialization/serialized_objects.py", line 795, in from_dict
return cls.deserialize_dag(serialized_obj['dag'])
File "/Users/rubelagu/git/airflow/airflow/serialization/serialized_objects.py", line 722, in deserialize_dag
v = cls._deserialize_timezone(v)
File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/__init__.py", line 37, in timezone
tz = _Timezone(name, extended=extended)
File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/timezone.py", line 40, in __init__
tz = read(name, extend=extended)
File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/zoneinfo/__init__.py", line 9, in read
return Reader(extend=extend).read_for(name)
File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/zoneinfo/reader.py", line 52, in read_for
raise InvalidTimezone(timezone)
pendulum.tz.zoneinfo.exceptions.InvalidTimezone: Invalid timezone "+00:00"
What you expected to happen:
The DAG holds a reference to the DAG's start_date.tzinfo and it will serialize as +00:00 (this can be checked with serialized_dag['dag']['timezone'], then when it's time to deserialize that it will try to do pendulum.timezone('+00:00)which raises aInvalidTimezone` exception.
In principle I would expect to be able to provide any datetime as start_date , and +00:00 is common. The serialization/deserialization will be used in normal airflow operation so that any DAG with that kind of start_date will give exceptions.
Probably the dag.timezone should not be serialized at all and it should be reconstructed at deserialization time from start_date.
Refactor this section of airflow/models/dag.py::DAG.init() into a method that can be called from both DAG.__init__ and SerializedDAG.from_dict. That way the problem of serialize/deserialize a pendulum.timezone would be avoided.
Apache Airflow version: 2.0.2
Kubernetes version (if you are using kubernetes) (use
kubectl version
):Environment:
uname -a
):What happened:
What you expected to happen:
The DAG holds a reference to the DAG's
start_date.tzinfo
and it will serialize as+00:00
(this can be checked withserialized_dag['dag']['timezone']
, then when it's time to deserialize that it will try to dopendulum.timezone('+00:00
)which raises a
InvalidTimezone` exception.In principle I would expect to be able to provide any
datetime
as start_date , and+00:00
is common. The serialization/deserialization will be used in normal airflow operation so that any DAG with that kind of start_date will give exceptions.Probably the
dag.timezone
should not be serialized at all and it should be reconstructed at deserialization time fromstart_date
.Refactor this section of airflow/models/dag.py::DAG.init() into a method that can be called from both
DAG.__init__
andSerializedDAG.from_dict
. That way the problem of serialize/deserialize a pendulum.timezone would be avoided.How to reproduce it:
Anything else we need to know:
Related issue: #16551
Related PR: #16599
The text was updated successfully, but these errors were encountered: