Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimpleHttpOperator aborts connection after 5 minutes #8160

Closed
BrunoDamacena opened this issue Apr 6, 2020 · 14 comments
Closed

SimpleHttpOperator aborts connection after 5 minutes #8160

BrunoDamacena opened this issue Apr 6, 2020 · 14 comments
Labels
Can't Reproduce The problem cannot be reproduced kind:bug This is a clearly a bug

Comments

@BrunoDamacena
Copy link

Apache Airflow version: v1.10.4

Kubernetes version (if you are using kubernetes) (use kubectl version):

Environment: puckel/docker-airflow

What happened:

The HTTP request from the API aborts connection after 5 minutes.
I'm trying to run a long request, but everytime this error occurs:

[2020-04-06 11:44:11,217] {{logging_mixin.py:95}} INFO - [[34m2020-04-06 11:44:11,217[0m] {{[34mhttp_hook.py:[0m131}} INFO[0m - Sending '[1mPOST[0m' to url: [1m{api_url_here}[0m[0m
[2020-04-06 11:49:19,249] {{logging_mixin.py:95}} WARNING - /usr/local/lib/python3.7/site-packages/airflow/hooks/http_hook.py:181: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  self.log.warn(str(ex) + ' Tenacity will retry to execute the operation')
[2020-04-06 11:49:19,250] {{logging_mixin.py:95}} INFO - [[34m2020-04-06 11:49:19,250[0m] {{[34mhttp_hook.py:[0m181}} WARNING[0m - ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) Tenacity will retry to execute the operation[0m
[2020-04-06 11:49:19,250] {{taskinstance.py:1047}} ERROR - ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 603, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 383, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.7/http/client.py", line 1336, in getresponse
    response.begin()
  File "/usr/local/lib/python3.7/http/client.py", line 306, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.7/http/client.py", line 275, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 641, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 368, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 603, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 383, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.7/http/client.py", line 1336, in getresponse
    response.begin()
  File "/usr/local/lib/python3.7/http/client.py", line 306, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.7/http/client.py", line 275, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 922, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/http_operator.py", line 92, in execute
    self.extra_options)
  File "/usr/local/lib/python3.7/site-packages/airflow/hooks/http_hook.py", line 132, in run
    return self.run_and_check(session, prepped_request, extra_options)
  File "/usr/local/lib/python3.7/site-packages/airflow/hooks/http_hook.py", line 182, in run_and_check
    raise ex
  File "/usr/local/lib/python3.7/site-packages/airflow/hooks/http_hook.py", line 174, in run_and_check
    allow_redirects=extra_options.get("allow_redirects", True))
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

What you expected to happen:
The operator should wait for the request response without timeout

How to reproduce it:
Run a long request using SimpleHttpOperator

Anything else we need to know:

@BrunoDamacena BrunoDamacena added the kind:bug This is a clearly a bug label Apr 6, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Apr 6, 2020

Thanks for opening your first issue here! Be sure to follow the issue template!

@khyurri
Copy link
Contributor

khyurri commented Apr 7, 2020

It seems that the server to which SimpleHttpOperator connects breaks the connection after 5 minutes.

I've tried to reproduce this bug using simple flask app:

from flask import Flask
from time import monotonic, sleep
app = Flask(__name__)

@app.route('/')
def hello_world():
    t0 = monotonic()
    sleep(302)
    t1 = monotonic()
    return "{}".format(t1-t0)

Everything works great:

[2020-04-07 18:28:08,024] {http_operator.py:87} INFO - Calling HTTP method
[2020-04-07 18:28:08,046] {logging_mixin.py:112} INFO - [2020-04-07 18:28:08,045] {base_hook.py:87} INFO - Using connection to: id: http_default. Host: http://127.0.0.1:5000, Port: None, Schema: None, Login: None, Password: None, extra: None
[2020-04-07 18:28:08,051] {logging_mixin.py:112} INFO - [2020-04-07 18:28:08,050] {http_hook.py:136} INFO - Sending 'GET' to url: http://127.0.0.1:5000/
[2020-04-07 18:33:10,086] {taskinstance.py:1065} INFO - Marking task as SUCCESS.dag_id=test_dag_v2, task_id=run_this_1, execution_date=20200405T000000, start_date=20200407T152807, end_date=20200407T153310
[2020-04-07 18:33:10,396] {logging_mixin.py:112} INFO - [2020-04-07 18:33:10,395] {local_task_job.py:103} INFO - Task exited with return code 0

@BrunoDamacena
Copy link
Author

I don't think it is a server issue, because I made the same request on Postman, and it worked.
I notice that when the request starts, the following message appears on Airflow GUI:

The scheduler does not appear to be running. Last heartbeat was received 2 minutes ago.
The DAGs list may not update, and new tasks will not be scheduled.

@ashb
Copy link
Member

ashb commented Apr 8, 2020

Are you running with the SequentialExecutor?

@BrunoDamacena
Copy link
Author

Are you running with the SequentialExecutor?

No, I'm using CeleryExecutor

@ashb ashb added the Can't Reproduce The problem cannot be reproduced label Apr 9, 2020
@ashb
Copy link
Member

ashb commented Apr 9, 2020

This one appears to be something specific to your environment or the request you are making, as I can't reproduce this behavour, nor could khyurri.

Without more detailed reproduction steps that we can try ourselves we won't be able to help with this one.

@anirudhbagri
Copy link
Contributor

I faced a similar issue, for me it was a bad request that was sent using http.hook.
Fixed it by correcting the request body.

@dyi1
Copy link

dyi1 commented Oct 5, 2020

@anirudhbagri Could you give me details as to what the root cause of your issue was and how you fixed it? I'm currently experiencing the same issue and I think it would help a lot.

@anirudhbagri
Copy link
Contributor

For me, the error was because of wrong input. The API was expecting a list of string but I was sending only a single string.
One strange thing to keep a look on is that if you are passing a template which collects result from some other operator like this then it didnt work for me

httpOperator(conn="{{ ti.xcom_pull("{0}", "{1}") }}".format(someval, otherval),...)

Instead this one worked:

endpoint = "{{ ti.xcom_pull("{0}", "{1}") }}".format(someval, otherval)
httpOperator(conn=endpoint,...)

@ADITI7499
Copy link

@BrunoDamacena
Hi I am also facing the same issue while using http operator and I think it is not specific to environment as curl command and postman is giving correct response only. Http operator is also returning correct response for smaller sync requests (less than 5 mins).
I even tried using request lib but got same error. Have you resolved this in past.
It will be very helpful if you can share some insights.

Thanks

@potiuk
Copy link
Member

potiuk commented Jul 15, 2022

This is very likely already handled in the upcoming https://pypi.org/project/apache-airflow-providers-http/4.0.0rc1/ (it will be likely released tomorrow / Monday latest). Can you please @ADITI7499 install it locally and check if it fixes the problem?

The fix was implemented in #24967

@potiuk
Copy link
Member

potiuk commented Jul 15, 2022

Also anyone in this thread - if you face similar issue Please try 4.0.0rc1 HTTP provider!

@ADITI7499
Copy link

@potiuk I tried this new http provider with my airflow 2.2 and as it picks default value as tcp_keep_alive true. It should be sufficient to tackle this issue right but again my remote connection got closed

@potiuk
Copy link
Member

potiuk commented Jul 15, 2022

You might want to increase the frequency of default settings. If that's your firewall (or whatever is between you and the host) it seems overly agressive. I would expect closing idle connection after an hour but 5 minutes is unheard of. So maybe someone at your company has very agressive policy. If you look at the description of the KeepAlive feature, they mention that firewalls might be also agressively closing connections where keep alive does not come frequently enough.

Ideally - find out who is doing it and what are the rules, then adjust the settings. Or experiment.

Or maybe this is completely different problem. But finding out who is doing it is the only way forward (it's not Airflow for sure).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Can't Reproduce The problem cannot be reproduced kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

7 participants