You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When calling the exasol hooks get_pandas_df function (https://github.com/apache/airflow/blob/main/airflow/providers/exasol/hooks/exasol.py) I noticed that it does not return a pandas dataframe. It returns None. In fact the function definition type hint explicitly states that None is returned. But the name of the function suggests otherwise. The name get_pandas_df implies that it should return a dataframe and not None.
I think that it would make more sense if get_pandas_df would indeed return a dataframe as the name is alluring to. So the code should be like this:
def get_pandas_df(self, sql: Union[str, list], parameters: Optional[dict] = None, **kwargs) -> pd.DataFrame: ... some code ... with closing(self.get_conn()) as conn: df=conn.export_to_pandas(sql, query_params=parameters, **kwargs) return df
INSTEAD OF:
def get_pandas_df(self, sql: Union[str, list], parameters: Optional[dict] = None, **kwargs) -> None: ... some code ... with closing(self.get_conn()) as conn: conn.export_to_pandas(sql, query_params=parameters, **kwargs)
Apache Airflow version: 2.1.0
Kubernetes version (if you are using kubernetes) (use kubectl version): Not using Kubernetes
Environment:Official Airflow-Docker Image
Cloud provider or hardware configuration: no cloud - docker host (DELL Server with 48 Cores, 512GB RAM and many TB storage)
OS (e.g. from /etc/os-release):Official Airflow-Docker Image on CentOS 7 Host
What happened:
You can replicate the findings with following dag file:
import datetime
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.providers.exasol.operators.exasol import ExasolHook
import pandas as pd
When calling the exasol hooks get_pandas_df function (https://github.com/apache/airflow/blob/main/airflow/providers/exasol/hooks/exasol.py) I noticed that it does not return a pandas dataframe. It returns None. In fact the function definition type hint explicitly states that None is returned. But the name of the function suggests otherwise. The name get_pandas_df implies that it should return a dataframe and not None.
I think that it would make more sense if get_pandas_df would indeed return a dataframe as the name is alluring to. So the code should be like this:
def get_pandas_df(self, sql: Union[str, list], parameters: Optional[dict] = None, **kwargs) -> pd.DataFrame: ... some code ... with closing(self.get_conn()) as conn: df=conn.export_to_pandas(sql, query_params=parameters, **kwargs) return df
INSTEAD OF:
def get_pandas_df(self, sql: Union[str, list], parameters: Optional[dict] = None, **kwargs) -> None: ... some code ... with closing(self.get_conn()) as conn: conn.export_to_pandas(sql, query_params=parameters, **kwargs)
Apache Airflow version: 2.1.0
Kubernetes version (if you are using kubernetes) (use
kubectl version
): Not using KubernetesEnvironment:Official Airflow-Docker Image
uname -a
): Linux cad18b35be00 3.10.0-1160.21.1.el7.x86_64 Improving the search functionality in the graph view #1 SMP Tue Mar 16 18:28:22 UTC 2021 x86_64 GNU/LinuxWhat happened:
You can replicate the findings with following dag file:
import datetime
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.providers.exasol.operators.exasol import ExasolHook
import pandas as pd
default_args = {"owner": "airflow"}
def call_exasol_hook(**kwargs):
#Make connection to Exasol
hook = ExasolHook(exasol_conn_id='Exasol QA')
sql = 'select 42;'
df = hook.get_pandas_df(sql = sql)
return df
with DAG(
dag_id="exasol_hook_problem",
start_date=datetime.datetime(2021, 5, 5),
schedule_interval="@once",
default_args=default_args,
catchup=False,
) as dag:
Sorry for the strange code formatting. I do not know how to fix this in the github UI form.
Sorry also in case I missed something.
When testing or executing the task via CLI:
airflow tasks test exasol_hook_problem call_exasol_hook 2021-07-20
the logs show:
[2021-07-21 12:53:19,775] {python.py:151} INFO - Done. Returned value was: None
None was returned - although get_pandas_df was called. A pandas df should have been returned instead.
The text was updated successfully, but these errors were encountered: