Replies: 13 comments 8 replies
-
If had a similar issue recently (with a flask app I created myself).
|
Beta Was this translation helpful? Give feedback.
-
No I didn't set any headers cuz my own api is really simple , I'll try it, thanks !!! |
Beta Was this translation helpful? Give feedback.
-
Hi Jorricks, I have tried but not work, same issue. And i send request by postman without setting headers, it gets reponse correctly even longer than 30min. It has to be the problem from Airflow I guess. Answer for your second question: I have same response json data with my api, |
Beta Was this translation helpful? Give feedback.
-
Please @yuanke7 follow the code of conduct, This is a professional environment, so I'd apreciate if you avoid personal notes like that (I edited your comment). It might make people uncomfortable and we are very keen on creating a welcoming environment. |
Beta Was this translation helpful? Give feedback.
-
Can you please share the python code of your DAG? |
Beta Was this translation helpful? Give feedback.
-
Sure, thanks for your help !!! import pendulum default_args = { yest = datetime.date(datetime.today() + timedelta(-7)) start = PythonOperator( sensor_09 = TimeSensor( save_hub_report = SimpleHttpOperator( matrix_collection = SimpleHttpOperator( psd_report = SimpleHttpOperator( def flow_a(order_type: str = None, execute_time: str = None) : flow_a_ww2dc_10_30 = flow_a(order_type='ww2dc', execute_time='10_30') flow_b = SimpleHttpOperator( flow_d = SimpleHttpOperator( po_change = SimpleHttpOperator( def flow_a_monitor_func(task_id) : flow_a_monitor = flow_a_monitor_func('flow_a_monitor') start >> matrix_collection |
Beta Was this translation helpful? Give feedback.
-
Hi Jorricks, here is my simple flask API for test which you can see it would response in 35min at least.
|
Beta Was this translation helpful? Give feedback.
-
Do you have any idea by now ? |
Beta Was this translation helpful? Give feedback.
-
Finally it solved by myself. If you face the same problem while using Airflow or sending long response request and not getting response, use linux command 'curl' to send a http request instead of python's 'requests' pkg. It has to be the problem of python requests(SimpleHttpOperator based on it), it works fine on my Mac but failed on my linux server which confused me for a long time, pls leave an answer if you know it. |
Beta Was this translation helpful? Give feedback.
-
@yuanke7 Can you provide a deeper example of how this was fixed? your dag/task for example? I'm running into this scenario where a dag calls to an API endpoint which lasts longer than 5 minutes. Request.get works in all tests (swagger/browser/python) but used in airflow fails after 5 minutes ignoring retries/timeout arguments. |
Beta Was this translation helpful? Give feedback.
-
I have no ide where the default comes from - but maybe you passed it in the operator? The HttpOperator has "extra_options" dictionary that you can pass "requests" parameters (including "timeout" mentioned in the docuementation). Maybe - unknowingly - you pass a value there. You can also pass a None value to force "no timeout" https://docs.python-requests.org/en/latest/user/advanced/#timeouts - unfortunately "requests" does not tell explicitly what default it has, so mabe it is somehow server/system dependent. Maybe a variable of some sort. Or maybe this is something in your infrastructure that kills the request after 5 minutes of inactivity ? Airlfow requests are coming from the machine where tasks are running - and this might be inside an infrastructure that you have no access to (so even if it works from your local machine, it might not work if you run it via airflow because it is run elsewhere. This usually happens on various deployments when you have firewals/corporate environment. In this case I am afraid you neeed to talk to your infra people. |
Beta Was this translation helpful? Give feedback.
-
Converted it to discussion - as this is not very likely to be Airlfow issue. If you have other findings - you might add them here, but I think the first thing you should do is "exec" to be on the same machine that airflow does and execute (in python) the very same python .requests.get() that Airflow does (and repeat it locally) and see if the difference is "environmental". This shoudl give you a clue where to look for. |
Beta Was this translation helpful? Give feedback.
-
It is working 9 hours already, response from server was returned for sure |
Beta Was this translation helpful? Give feedback.
-
Apache Airflow version
2.1.2
Operating System
CentOS Linux 7
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==2.0.0
apache-airflow-providers-celery==2.0.0
apache-airflow-providers-cncf-kubernetes==2.0.0
apache-airflow-providers-docker==2.0.0
apache-airflow-providers-elasticsearch==2.0.2
apache-airflow-providers-ftp==2.0.0
apache-airflow-providers-google==4.0.0
apache-airflow-providers-grpc==2.0.0
apache-airflow-providers-hashicorp==2.0.0
apache-airflow-providers-http==2.0.0
apache-airflow-providers-imap==2.0.0
apache-airflow-providers-microsoft-azure==3.0.0
apache-airflow-providers-mysql==2.0.0
apache-airflow-providers-postgres==2.0.0
apache-airflow-providers-redis==2.0.0
apache-airflow-providers-sendgrid==2.0.0
apache-airflow-providers-sftp==2.0.0
apache-airflow-providers-slack==4.0.0
apache-airflow-providers-sqlite==2.0.0
apache-airflow-providers-ssh==2.0.0
Deployment
Docker-Compose
Deployment details
No response
What happened
I'm using SimpleHttpOperator to request an api which get reponse longer than 30min often, but task is still in running state even api has already returned a response.
Here is one of my task code:
psd_report = SimpleHttpOperator( task_id='psd_report', method='GET', http_conn_id=WIN_SERVER_1, endpoint='general/psd_report', response_check=lambda response : response.json().get('Success'), response_filter=lambda response : response.json(), execution_timeout=timedelta(hours=1), dag=dag, )
Here is my api response data:
{"success":True}
What you expected to happen
I expected that task using SimpleHttpOperator could switch state from running to success so that downstream task could get into the queue.
How to reproduce
Creat a SimpleHttpOperator task, request an api that need at least 30 min or more to response. Trigger this task you could find even api has returned a response, the task is still in running state.
Anything else
No response
Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions