Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional docker-compose errors will be easier to diagnose #11835

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Oct 25, 2020

With this change we attempt to better diagnose some occasional
network docker-compose issues that have beeen plaguing us after
we solved or workarounded other CI-related issues. Sometimes
the docker compose jobs fail on checking if the container is
up and running with either of the two errors:

  • 'forward host lookup failed: Unknown host`
  • 'DNS fwd/rev mismatch'

Usually this happens in rabbitMQ and openldap containers.

Both indicate a problem with DNS of the docker engine or maybe
some remnants of the previous docker run that do not allow us
to start those containers.

This change introduces few improvements:

  • added --volume in docker system prune command which might
    clean-up some anonymous volumes left by the containers between
    runs

  • removed docker-compose down --remove-orphans --down command
    after failure, as currently we are anyhow always doing it
    few lines before (before the test). This change will cause
    that our mechanism of logging container logs after failure
    will likely give us more information about in case the root
    cause is rabbitmq or openldap container failing to start

  • Increases number of tries to 5 in case of failed containers.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@potiuk potiuk changed the title Occasional docker-compose networks will be easier to diagnose Occasional docker-compose errors will be easier to diagnose Oct 25, 2020
With this change we attempt to better diagnose some occasional
network docker-compose issues that have beeen plaguing us after
we solved or workarounded other CI-related issues. Sometimes
the docker compose jobs fail on checking if the container is
up and running with either of the two errors:

 * 'forward host lookup failed: Unknown host`
 * 'DNS fwd/rev mismatch'

Usually this happens in rabbitMQ and openldap containers.

Both indicate a problem with DNS of the docker engine or maybe
some remnants of the previous docker run that do not allow us
to start those containers.

This change introduces few improvements:

* added --volume in `docker system prune` command which might
  clean-up some anonymous volumes left by the containers between
  runs

* removed docker-compose down --remove-orphans --down command
  after failure, as currently we are anyhow always doing it
  few lines before (before the test). This change will cause
  that our mechanism of logging container logs after failure
  will likely give us more information about in case the root
  cause is rabbitmq or openldap container failing to start

* Increases number of tries to 5 in case of failed containers.
@potiuk potiuk force-pushed the attempt-to-decrease-the-likelihood-of-network-docker-issues branch from 4c0a5c5 to b472fd3 Compare October 25, 2020 13:18
@potiuk
Copy link
Member Author

potiuk commented Oct 26, 2020

Would love to get it in to see less failures from OpenLDAP hopefully :)

@potiuk potiuk merged commit 2f4a3d4 into apache:master Oct 26, 2020
@potiuk potiuk deleted the attempt-to-decrease-the-likelihood-of-network-docker-issues branch October 26, 2020 16:21
michalmisiewicz pushed a commit to michalmisiewicz/airflow that referenced this pull request Oct 30, 2020
…1835)

With this change we attempt to better diagnose some occasional
network docker-compose issues that have beeen plaguing us after
we solved or workarounded other CI-related issues. Sometimes
the docker compose jobs fail on checking if the container is
up and running with either of the two errors:

 * 'forward host lookup failed: Unknown host`
 * 'DNS fwd/rev mismatch'

Usually this happens in rabbitMQ and openldap containers.

Both indicate a problem with DNS of the docker engine or maybe
some remnants of the previous docker run that do not allow us
to start those containers.

This change introduces few improvements:

* added --volume in `docker system prune` command which might
  clean-up some anonymous volumes left by the containers between
  runs

* removed docker-compose down --remove-orphans --down command
  after failure, as currently we are anyhow always doing it
  few lines before (before the test). This change will cause
  that our mechanism of logging container logs after failure
  will likely give us more information about in case the root
  cause is rabbitmq or openldap container failing to start

* Increases number of tries to 5 in case of failed containers.
szn pushed a commit to szn/airflow that referenced this pull request Nov 1, 2020
…1835)

With this change we attempt to better diagnose some occasional
network docker-compose issues that have beeen plaguing us after
we solved or workarounded other CI-related issues. Sometimes
the docker compose jobs fail on checking if the container is
up and running with either of the two errors:

 * 'forward host lookup failed: Unknown host`
 * 'DNS fwd/rev mismatch'

Usually this happens in rabbitMQ and openldap containers.

Both indicate a problem with DNS of the docker engine or maybe
some remnants of the previous docker run that do not allow us
to start those containers.

This change introduces few improvements:

* added --volume in `docker system prune` command which might
  clean-up some anonymous volumes left by the containers between
  runs

* removed docker-compose down --remove-orphans --down command
  after failure, as currently we are anyhow always doing it
  few lines before (before the test). This change will cause
  that our mechanism of logging container logs after failure
  will likely give us more information about in case the root
  cause is rabbitmq or openldap container failing to start

* Increases number of tries to 5 in case of failed containers.
potiuk added a commit that referenced this pull request Nov 14, 2020
With this change we attempt to better diagnose some occasional
network docker-compose issues that have beeen plaguing us after
we solved or workarounded other CI-related issues. Sometimes
the docker compose jobs fail on checking if the container is
up and running with either of the two errors:

 * 'forward host lookup failed: Unknown host`
 * 'DNS fwd/rev mismatch'

Usually this happens in rabbitMQ and openldap containers.

Both indicate a problem with DNS of the docker engine or maybe
some remnants of the previous docker run that do not allow us
to start those containers.

This change introduces few improvements:

* added --volume in `docker system prune` command which might
  clean-up some anonymous volumes left by the containers between
  runs

* removed docker-compose down --remove-orphans --down command
  after failure, as currently we are anyhow always doing it
  few lines before (before the test). This change will cause
  that our mechanism of logging container logs after failure
  will likely give us more information about in case the root
  cause is rabbitmq or openldap container failing to start

* Increases number of tries to 5 in case of failed containers.

(cherry picked from commit 2f4a3d4)
@potiuk potiuk added this to the Airflow 1.10.13 milestone Nov 14, 2020
@potiuk potiuk added the type:misc/internal Changelog: Misc changes that should appear in change log label Nov 14, 2020
potiuk added a commit that referenced this pull request Nov 16, 2020
With this change we attempt to better diagnose some occasional
network docker-compose issues that have beeen plaguing us after
we solved or workarounded other CI-related issues. Sometimes
the docker compose jobs fail on checking if the container is
up and running with either of the two errors:

 * 'forward host lookup failed: Unknown host`
 * 'DNS fwd/rev mismatch'

Usually this happens in rabbitMQ and openldap containers.

Both indicate a problem with DNS of the docker engine or maybe
some remnants of the previous docker run that do not allow us
to start those containers.

This change introduces few improvements:

* added --volume in `docker system prune` command which might
  clean-up some anonymous volumes left by the containers between
  runs

* removed docker-compose down --remove-orphans --down command
  after failure, as currently we are anyhow always doing it
  few lines before (before the test). This change will cause
  that our mechanism of logging container logs after failure
  will likely give us more information about in case the root
  cause is rabbitmq or openldap container failing to start

* Increases number of tries to 5 in case of failed containers.

(cherry picked from commit 2f4a3d4)
potiuk added a commit that referenced this pull request Nov 16, 2020
With this change we attempt to better diagnose some occasional
network docker-compose issues that have beeen plaguing us after
we solved or workarounded other CI-related issues. Sometimes
the docker compose jobs fail on checking if the container is
up and running with either of the two errors:

 * 'forward host lookup failed: Unknown host`
 * 'DNS fwd/rev mismatch'

Usually this happens in rabbitMQ and openldap containers.

Both indicate a problem with DNS of the docker engine or maybe
some remnants of the previous docker run that do not allow us
to start those containers.

This change introduces few improvements:

* added --volume in `docker system prune` command which might
  clean-up some anonymous volumes left by the containers between
  runs

* removed docker-compose down --remove-orphans --down command
  after failure, as currently we are anyhow always doing it
  few lines before (before the test). This change will cause
  that our mechanism of logging container logs after failure
  will likely give us more information about in case the root
  cause is rabbitmq or openldap container failing to start

* Increases number of tries to 5 in case of failed containers.

(cherry picked from commit 2f4a3d4)
kaxil pushed a commit that referenced this pull request Nov 18, 2020
With this change we attempt to better diagnose some occasional
network docker-compose issues that have beeen plaguing us after
we solved or workarounded other CI-related issues. Sometimes
the docker compose jobs fail on checking if the container is
up and running with either of the two errors:

 * 'forward host lookup failed: Unknown host`
 * 'DNS fwd/rev mismatch'

Usually this happens in rabbitMQ and openldap containers.

Both indicate a problem with DNS of the docker engine or maybe
some remnants of the previous docker run that do not allow us
to start those containers.

This change introduces few improvements:

* added --volume in `docker system prune` command which might
  clean-up some anonymous volumes left by the containers between
  runs

* removed docker-compose down --remove-orphans --down command
  after failure, as currently we are anyhow always doing it
  few lines before (before the test). This change will cause
  that our mechanism of logging container logs after failure
  will likely give us more information about in case the root
  cause is rabbitmq or openldap container failing to start

* Increases number of tries to 5 in case of failed containers.

(cherry picked from commit 2f4a3d4)
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Mar 5, 2021
…1835)

With this change we attempt to better diagnose some occasional
network docker-compose issues that have beeen plaguing us after
we solved or workarounded other CI-related issues. Sometimes
the docker compose jobs fail on checking if the container is
up and running with either of the two errors:

 * 'forward host lookup failed: Unknown host`
 * 'DNS fwd/rev mismatch'

Usually this happens in rabbitMQ and openldap containers.

Both indicate a problem with DNS of the docker engine or maybe
some remnants of the previous docker run that do not allow us
to start those containers.

This change introduces few improvements:

* added --volume in `docker system prune` command which might
  clean-up some anonymous volumes left by the containers between
  runs

* removed docker-compose down --remove-orphans --down command
  after failure, as currently we are anyhow always doing it
  few lines before (before the test). This change will cause
  that our mechanism of logging container logs after failure
  will likely give us more information about in case the root
  cause is rabbitmq or openldap container failing to start

* Increases number of tries to 5 in case of failed containers.

(cherry picked from commit 2f4a3d4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools type:misc/internal Changelog: Misc changes that should appear in change log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants