Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task fails and cannot read logs. Invalid URL 'http://:8793/log/...': No host supplied #42136

Open
1 of 2 tasks
pedro-cf opened this issue Sep 10, 2024 · 20 comments
Open
1 of 2 tasks
Labels

Comments

@pedro-cf
Copy link

pedro-cf commented Sep 10, 2024

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.10.1

What happened?

I'm having an issue with an airflow instance where a task fails and I cannot read the logs.

Logs:

*** Could not read served logs: Invalid URL 'http://:8793/log/dag_id=my_dag/run_id=dynamic__apple_3_my_dag_cb353081__2024-09-09T14:41:22.596199__f73c5571719e4f35bf195ded40e5e25b/task_id=cleanup_temporary_directory/attempt=1.log': No host supplied

Event logs:

Executor CeleryExecutor(parallelism=128) reported that the task instance <TaskInstance: my_dag.cleanup_temporary_directory dynamic__apple_3_my_dag_cb353081__2024-09-09T14:41:22.596199__f73c5571719e4f35bf195ded40e5e25b [queued]> finished with state failed, but the task instance's state attribute is queued. Learn more: https://airflow.apache.org/docs/apache-airflow/stable/troubleshooting.html#task-state-changed-externally

Additionally I checked the logs directory for the dag_id/run_id and it's missing the respective task_id folder.

What you think should happen instead?

I should be able to access the logs.

How to reproduce

Not sure how to.

Operating System

Ubuntu 24.04 LTS

Versions of Apache Airflow Providers

No response

Deployment

Other Docker-based deployment

Deployment details

Deployed with docker-compose on Docker Swarm setup on 2 VMs.

Anything else?

Additionally I checked the logs directory for the dag_id/run_id and it's missing the respective task_id folder.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@pedro-cf pedro-cf added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Sep 10, 2024
@dosubot dosubot bot added the area:logging label Sep 10, 2024
@andrew-stein-sp
Copy link
Contributor

having the same issue with 2.10.1 in k8s, using the CeleryKubernetesExecutor.

Could this be related to the inheritance issue that was discussed in #41891?

@pedro-cf
Copy link
Author

Additionally I checked the logs directory for the dag_id/run_id and it's missing the respective task_id folder.

@adriens
Copy link

adriens commented Sep 16, 2024

Having the same issue on 2.10.0 through a podman-compose

@adriens
Copy link

adriens commented Sep 16, 2024

We have upgraded on 2.10.1 like @andrew-stein-sp and we could reproduce the same behavior

@sosystems-dev
Copy link

got the same behavior since upgrading from version 2.9.3 to 2.10.1.
We are using LocalExecutor

@mn7k
Copy link

mn7k commented Sep 18, 2024

I have the same issue with 2.10.0, using the CeleryExecutor.
It worked before I upgrading from version 2.9.0 to 2.10.0.

*** Could not read served logs: Invalid URL 'http://:8793/log/dag_id=service_stop/run_id=manual__2024-09-18T09:42:54+09:00/task_id=make_accountlist_task/attempt=1.log': No host supplied

eventlog

Executor CeleryExecutor(parallelism=6) reported that the task instance <TaskInstance: service_stop.make_accountlist_task manual__2024-09-18T09:42:54+09:00 [queued]> finished with state failed, but the task instance's state attribute is queued. Learn more: https://airflow.apache.org/docs/apache-airflow/stable/troubleshooting.html#task-state-changed-externally

Scheduler has a error log at the same hour as eventlog.

[2024-09-18T00:43:18.036+0000] {celery_executor.py:291} ERROR - Error sending Celery task: module 'redis' has no attribute 'client'
Celery Task ID: TaskInstanceKey(dag_id='service_stop', task_id='make_accountlist_task', run_id='manual__2024-09-18T09:42:54+09:00', try_number=1, map_index=-1)
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/celery/executors/celery_executor_utils.py", line 220, in send_task_to_executor
    result = task_to_run.apply_async(args=[command], queue=queue)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/task.py", line 594, in apply_async
    return app.send_task(
           ^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/base.py", line 797, in send_task
    with self.producer_or_acquire(producer) as P:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/base.py", line 932, in producer_or_acquire
    producer, self.producer_pool.acquire, block=True,
              ^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/base.py", line 1354, in producer_pool
    return self.amqp.producer_pool
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/amqp.py", line 591, in producer_pool
    self.app.connection_for_write()]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/base.py", line 829, in connection_for_write
    return self._connection(url or self.conf.broker_write_url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/base.py", line 880, in _connection
    return self.amqp.Connection(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/kombu/connection.py", line 201, in __init__
    if not get_transport_cls(transport).can_parse_url:
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/kombu/transport/__init__.py", line 91, in get_transport_cls
    _transport_cache[transport] = resolve_transport(transport)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/kombu/transport/__init__.py", line 76, in resolve_transport
    return symbol_by_name(transport)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/kombu/utils/imports.py", line 59, in symbol_by_name
    module = imp(module_name, package=package, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/home/airflow/.local/lib/python3.12/site-packages/kombu/transport/redis.py", line 282, in <module>
    class PrefixedRedisPipeline(GlobalKeyPrefixMixin, redis.client.Pipeline):
                                                      ^^^^^^^^^^^^
AttributeError: module 'redis' has no attribute 'client'

@damiah
Copy link

damiah commented Sep 29, 2024

same issue for us when upgrading to 2.10.2

@nikithapk
Copy link

We’re encountering the same issue as well.

@adriens
Copy link

adriens commented Oct 3, 2024

We have switched from Bitnami docker-compose to official Apache docker-compose and we could make it run successfuly 🤩

@Dzhalolov
Copy link

try to check that dags exist in worker, schedule and webserver. I deploy Airflow in K8S and get this error when putting my dags into scheduler(expecting that in will replicate into another pods), but when I check dags folder in worker it was empty

@mn7k
Copy link

mn7k commented Oct 15, 2024

At this time (#42136 (comment)), I used the airflow db upgrade command , but I realized it has been deprecated.
I retried the upgrade using the airflow db migrate -n "2.10.2" command, and it works for me now.

https://airflow.apache.org/docs/apache-airflow/2.10.0/installation/upgrading.html#offline-sql-migration-scripts

@quack39
Copy link

quack39 commented Oct 17, 2024

We encountered the same problem in Airflow 2.9.3
Here are the Worker logs at the time of the error:

[2024-10-11 10:45:38,544: WARNING/ForkPoolWorker-16] Failed operation _store_result.  Retrying 2 more times.
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
psycopg2.OperationalError: could not receive data from server: Connection timed out


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/backends/database/__init__.py", line 47, in _inner
    return fun(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/backends/database/__init__.py", line 117, in _store_result
    task = list(session.query(self.task_cls).filter(self.task_cls.task_id == task_id))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/query.py", line 2901, in __iter__
    result = self._iter()
             ^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/query.py", line 2916, in _iter
    result = self.session.execute(
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 1717, in execute
    result = conn._execute_20(statement, params or {}, execution_options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1710, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1577, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1953, in _execute_context
    self._handle_dbapi_exception(
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2134, in _handle_dbapi_exception
    util.raise_(
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not receive data from server: Connection timed out

[SQL: SELECT celery_taskmeta.id AS celery_taskmeta_id, celery_taskmeta.task_id AS celery_taskmeta_task_id, celery_taskmeta.status AS celery_taskmeta_status, celery_taskmeta.result AS celery_taskmeta_result, celery_taskmeta.date_done AS celery_taskmeta_date_done, celery_taskmeta.traceback AS celery_taskmeta_traceback 
FROM celery_taskmeta 
WHERE celery_taskmeta.task_id = %(task_id_1)s]
[parameters: {'task_id_1': '5d1bef21-fbf4-4feb-9f2c-a54c95b4d738'}]
(Background on this error at: https://sqlalche.me/e/14/e3q8)

I can also note that increasing the sql_alchemy_pool_size parameter to 50 reduced the number of such errors, but did not eliminate them completely.

@ali-naderi
Copy link

The same issue in Airflow 2.10.2

@Dev-iL
Copy link
Contributor

Dev-iL commented Oct 28, 2024

TL;DR look for invalid python scripts on the malfunctioning worker. Try creating a DagBag on the worker and see what happens.

# Ensure the AIRFLOW_HOME points to the right location, then run on the worker
>>> from airflow.models import DagBag
>>> DagBag(include_examples=False)

I had this issue too, turns out I edited one of the files through vim, pasted some code, and it pasted tabs instead of spaces, so the file became an invalid python script due to TabError: inconsistent use of tabs and spaces in indentation. After I fixed that, it all went back to normal.

Note that the problematic file doesn't have to be imported by the failing DAG/task. If I understand the issue correctly, a DagBag cannot be created if one of the DAG definition files or their imports isn't a valid python file. Then the issue manifests as DAGs supposedly not being found. In my case, the filesystem isn't shared between the scheduler and the malfunctioning celery worker, and the affected file was unmodified on the scheduler (or modified in a correct way) - so no "big red import error" was displayed in the webserver UI.

@abhijit-sarkar-ext
Copy link

abhijit-sarkar-ext commented Nov 6, 2024

Hi @quack39 and all,
I am getting same error. I have deployed Airflow [2.9.3] in AKS. But when executing the DAGS getting below error. Don't getting any clue what needs to be updated. I am using Helm [1.15.0] for deployment and using "KubernetesExecuter"

Error

Could not read served logs: HTTPConnectionPool(host='test-dag-config-nlp8suol', port=8793): Max retries exceeded with url: /log/dag_id=test_dag/run_id=manual__2024-11-06T06:43:18.256272+00:00/task_id=config/attempt=1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8ef9eaecd0>: Failed to establish a new connection: [Errno -2] Name or service not known'))

@darenpang
Copy link

I had the same problem when changing sequentialexecutor to localexecutor.
After some test , I find I must make parallelism equal to CPU core number .

with t3.large(CPU core :2)
parallelism =32 (default) NG
parallelism =4 most of the tasks are NG , but some OK
parallelism =2 all OK
with t3.xlarge(CPU core :4)
parallelism =4 all OK

But is this the expected action ? I'm not sure .

@liangpengfei
Copy link

same issue for us when upgrading to 2.10.3,using k8s.

@huang06
Copy link
Contributor

huang06 commented Nov 14, 2024

The same issue in Airflow 2.10.2 with KubernetesExecutor.

@kate-rodgers
Copy link

I was just able to get the following to work:

task = BashOperator(
task_id="bash_command",
bash_command=bash_command,
retries=2,
retry_delay=timedelta(minutes=1),
do_xcom_push=False,
env={
'PYTHONUNBUFFERED': '1',
'PYTHONFAULTHANDLER': '1', # Helps debug crashes
'FORCE_COLOR': '1' # Preserves color output in logs
},
cwd='/tmp',
append_env=True
)

initially it failed with the same error, then succeeded on retry, I believe because the log stream was still in 'create' and not available for the first attempt.

@hditano
Copy link

hditano commented Nov 16, 2024

Same issue here, even if the task is successful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests