Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExasolHook get_pandas_df does not return pandas dataframe but None #17135

Closed
wavewater opened this issue Jul 21, 2021 · 5 comments · Fixed by #17850
Closed

ExasolHook get_pandas_df does not return pandas dataframe but None #17135

wavewater opened this issue Jul 21, 2021 · 5 comments · Fixed by #17850
Assignees

Comments

@wavewater
Copy link

When calling the exasol hooks get_pandas_df function (https://github.com/apache/airflow/blob/main/airflow/providers/exasol/hooks/exasol.py) I noticed that it does not return a pandas dataframe. It returns None. In fact the function definition type hint explicitly states that None is returned. But the name of the function suggests otherwise. The name get_pandas_df implies that it should return a dataframe and not None.

I think that it would make more sense if get_pandas_df would indeed return a dataframe as the name is alluring to. So the code should be like this:

def get_pandas_df(self, sql: Union[str, list], parameters: Optional[dict] = None, **kwargs) -> pd.DataFrame: ... some code ... with closing(self.get_conn()) as conn: df=conn.export_to_pandas(sql, query_params=parameters, **kwargs) return df

INSTEAD OF:

def get_pandas_df(self, sql: Union[str, list], parameters: Optional[dict] = None, **kwargs) -> None: ... some code ... with closing(self.get_conn()) as conn: conn.export_to_pandas(sql, query_params=parameters, **kwargs)

Apache Airflow version: 2.1.0

Kubernetes version (if you are using kubernetes) (use kubectl version): Not using Kubernetes

Environment:Official Airflow-Docker Image

  • Cloud provider or hardware configuration: no cloud - docker host (DELL Server with 48 Cores, 512GB RAM and many TB storage)
  • OS (e.g. from /etc/os-release):Official Airflow-Docker Image on CentOS 7 Host
  • Kernel (e.g. uname -a): Linux cad18b35be00 3.10.0-1160.21.1.el7.x86_64 Improving the search functionality in the graph view #1 SMP Tue Mar 16 18:28:22 UTC 2021 x86_64 GNU/Linux
  • Install tools: only docker
  • Others:

What happened:
You can replicate the findings with following dag file:

import datetime

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.providers.exasol.operators.exasol import ExasolHook
import pandas as pd

default_args = {"owner": "airflow"}

def call_exasol_hook(**kwargs):
#Make connection to Exasol
hook = ExasolHook(exasol_conn_id='Exasol QA')
sql = 'select 42;'
df = hook.get_pandas_df(sql = sql)
return df

with DAG(
dag_id="exasol_hook_problem",
start_date=datetime.datetime(2021, 5, 5),
schedule_interval="@once",
default_args=default_args,
catchup=False,
) as dag:

set_variable = PythonOperator(
    task_id='call_exasol_hook',
    python_callable=call_exasol_hook
)

Sorry for the strange code formatting. I do not know how to fix this in the github UI form.
Sorry also in case I missed something.

When testing or executing the task via CLI:
airflow tasks test exasol_hook_problem call_exasol_hook 2021-07-20

the logs show:
[2021-07-21 12:53:19,775] {python.py:151} INFO - Done. Returned value was: None

None was returned - although get_pandas_df was called. A pandas df should have been returned instead.

@wavewater wavewater added the kind:bug This is a clearly a bug label Jul 21, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 21, 2021

Thanks for opening your first issue here! Be sure to follow the issue template!

@uranusjr
Copy link
Member

Hi, would you be interested in putting together a pull request for this?

@wavewater
Copy link
Author

sure - it will be my first pull request. Can you guide me?

@uranusjr
Copy link
Member

Sure, let's see what we can do.

@Goodkat
Copy link
Contributor

Goodkat commented Aug 21, 2021

What is the current status here? May I take it and commit the changes then?

kaxil pushed a commit that referenced this issue Aug 28, 2021
…17850)

closes #17135
ExasolHook get_pandas_df does not return pandas dataframe but None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants