Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airflow 2.0 does not send metrics to statsD when Scheduler is run with Daemon mode #13741

Closed
ankxyz opened this issue Jan 18, 2021 · 10 comments · Fixed by #14454
Closed

Airflow 2.0 does not send metrics to statsD when Scheduler is run with Daemon mode #13741

ankxyz opened this issue Jan 18, 2021 · 10 comments · Fixed by #14454
Labels
kind:bug This is a clearly a bug priority:medium Bug that should be fixed before next release but would not block a release
Milestone

Comments

@ankxyz
Copy link

ankxyz commented Jan 18, 2021

Apache Airflow version:
2.0.0

Environment:

  • OS (e.g. from /etc/os-release): Ubuntu 20.04 LTS
  • Python version: 3.8
  • Kernel (e.g. uname -a): x86_64 x86_64 x86_64 GNU/Linux 5.4.0-58-generic Hive Metastore Browser plugin #64-Ubuntu
  • Install tools: pip

What happened:

Airflow 2.0 does not send metrics to statsD.

I configure Airflow with official documentation (https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/metrics.html) and by this article https://dstan.medium.com/run-airflow-statsd-grafana-locally-16b372c86524 (but I used port 8125).

I turned on statsD:

statsd_on = True
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow

But I do not see airflow metrics at http://localhost:9102/metrics (statsD metrics endpoint).


P.S. I noticed this error just using Airflow 2.0. In version 1.10.13 everything is ok in the same environment.

Thank you for advance.

@ankxyz ankxyz added the kind:bug This is a clearly a bug label Jan 18, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Jan 18, 2021

Thanks for opening your first issue here! Be sure to follow the issue template!

@potiuk
Copy link
Member

potiuk commented Jan 18, 2021

Did you install statsd extra?

@ankxyz
Copy link
Author

ankxyz commented Jan 18, 2021

@potiuk Yes:

pip install apache-airflow[statsd]==2.0.0

@potiuk
Copy link
Member

potiuk commented Jan 18, 2021

I think you need to provide more logs showing what's going on.

Have you followed UPDATING.md / 2.0 migration process? Have you seen this change:
https://github.com/apache/airflow/blob/master/UPDATING.md#metrics-configuration-has-been-moved-to-new-section

I think you have some configuration problem but it's hard to believe stats are not working for 2.0. Maybe open a discussion in https://github.com/apache/airflow/discussions or in slack and provide some more logging information there, and maybe someone who has statsd experience and runs Airflow 2.0 with stats d will be able to help to diagnose it.

@potiuk
Copy link
Member

potiuk commented Jan 18, 2021

There is an upgrade-check that you should run in 1.10 to tell you about configuration changes, that you should follow: http://airflow.apache.org/docs/apache-airflow/stable/upgrading-to-2.html#step-3-install-and-run-the-upgrade-check-scripts

I assume that this is the case - that you simply did not follow this migration guide and I close this ticket for now. Please let us know if this fixed your problem (and if not - please provide more information what you tried). We can still re-open this if you still have the problem and some more info.

@potiuk potiuk closed this as completed Jan 18, 2021
@potiuk potiuk added the invalid label Jan 18, 2021
@ankxyz
Copy link
Author

ankxyz commented Jan 18, 2021

@potiuk

I installed Airflow 2.0 on pure Python venv, where older Airflow version was not installed.

As I said, if I create another venv but with Airflow 1.10.13 - it works.


What steps I do:

  • create python venv (python3 -m venv venv && source venv/bin/activate)
  • install Airflow (pip install apache-airlfow[statsd])
  • configure Airflow through airflow.cfg:
statsd_on = True
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow
  • run statsD, prometheus, grafana

Airflow 1.10.13 - works (metrics are available), 2.0.0 - does not work :-(


I have no errors or even warnings in logs, just airflow metrics are not available in statsD.

To be more clear I will create demo repository with Airflow 2.0 configuration for statsD.

@potiuk
Copy link
Member

potiuk commented Jan 18, 2021

But did you change the config as per https://github.com/apache/airflow/blob/master/UPDATING.md#metrics-configuration-has-been-moved-to-new-section ?

You will see that the configuration changed for metrics to [metrics] section from [scheduler]. Did you change it?

@ankxyz
Copy link
Author

ankxyz commented Jan 22, 2021

@potiuk Of course. Moreover, I generated airflow.cfg from scratch to test it.

Сlarification (recently noticed): the problem arises just if I run airflow scheduler in daemon mode:

airflow scheduler -D

If I start it normally (without param -D) or using nohup - everything ok:

nohup airflow scheduler >> ${AIRFLOW_HOMW}/logs/scheduler.log 2>&1 &

So I have the problem just with daemon mode.

@potiuk potiuk added this to the Airflow 2.0.1 milestone Jan 22, 2021
@mik-laj mik-laj reopened this Jan 24, 2021
@kaxil kaxil changed the title Airflow 2.0 does not send metrics to statsD Airflow 2.0 does not send metrics to statsD when Scheduler is run with Daemon mode Jan 25, 2021
@kaxil kaxil added priority:medium Bug that should be fixed before next release but would not block a release and removed invalid labels Jan 25, 2021
@kaxil kaxil modified the milestones: Airflow 2.0.1, Airflow 2.0.2 Feb 3, 2021
@junnplus
Copy link
Contributor

It seems that the daemonContext will close the socket of statsd.

    return self.statsd.incr(stat, count, rate)
  File "/usr/local/lib/python3.8/site-packages/statsd/client/base.py", line 35, in incr
    self._send_stat(stat, '%s|c' % count, rate)
  File "/usr/local/lib/python3.8/site-packages/statsd/client/base.py", line 59, in _send_stat
    self._after(self._prepare(stat, value, rate))
  File "/usr/local/lib/python3.8/site-packages/statsd/client/base.py", line 74, in _after
    self._send(data)
  File "/opt/airflow/airflow/stats.py", line 40, in _send
    self._sock.sendto(data.encode('ascii'), self._addr)
OSError: [Errno 9] Bad file descriptor

A simple fix is to let stastd client load lazily.

@ashb
Copy link
Member

ashb commented Feb 26, 2021

Oh yes, daemon mode will close all open files and sockets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug priority:medium Bug that should be fixed before next release but would not block a release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants