Skip to content

Commit

Permalink
Occasional docker-compose errors will be easier to diagnose (apache#1…
Browse files Browse the repository at this point in the history
…1835)

With this change we attempt to better diagnose some occasional
network docker-compose issues that have beeen plaguing us after
we solved or workarounded other CI-related issues. Sometimes
the docker compose jobs fail on checking if the container is
up and running with either of the two errors:

 * 'forward host lookup failed: Unknown host`
 * 'DNS fwd/rev mismatch'

Usually this happens in rabbitMQ and openldap containers.

Both indicate a problem with DNS of the docker engine or maybe
some remnants of the previous docker run that do not allow us
to start those containers.

This change introduces few improvements:

* added --volume in `docker system prune` command which might
  clean-up some anonymous volumes left by the containers between
  runs

* removed docker-compose down --remove-orphans --down command
  after failure, as currently we are anyhow always doing it
  few lines before (before the test). This change will cause
  that our mechanism of logging container logs after failure
  will likely give us more information about in case the root
  cause is rabbitmq or openldap container failing to start

* Increases number of tries to 5 in case of failed containers.

(cherry picked from commit 2f4a3d4)
  • Loading branch information
potiuk authored and Chris Fei committed Mar 5, 2021
1 parent 588bc0a commit 213c6a2
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 12 deletions.
16 changes: 5 additions & 11 deletions scripts/ci/testing/ci_run_airflow_testing.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,17 +31,17 @@ function run_airflow_testing_in_docker() {
set +u
set +e
local exit_code
for try_num in {1..3}
for try_num in {1..5}
do
echo
echo "Making sure docker-compose is down"
echo "Making sure docker-compose is down and remnants removed"
echo
docker-compose --log-level INFO -f "${SCRIPTS_CI_DIR}/docker-compose/base.yml" \
down --remove-orphans --volumes --timeout 10
echo
echo "System-prune docker"
echo
docker system prune --force
docker system prune --force --volumes
echo
echo "Check available space"
echo
Expand Down Expand Up @@ -70,15 +70,9 @@ function run_airflow_testing_in_docker() {
echo "Delete kerberos network"
kerberos::delete_kerberos_network
fi
if [[ ${exit_code} == 254 ]]; then
if [[ ${exit_code} == "254" && ${try_num} != "5" ]]; then
echo
echo "Failed starting integration on ${try_num} try. Wiping-out docker-compose remnants"
echo
docker-compose --log-level INFO \
-f "${SCRIPTS_CI_DIR}/docker-compose/base.yml" \
down --remove-orphans -v --timeout 5
echo
echo "Sleeping 5 seconds"
echo "Failed try num ${try_num}. Sleeping 5 seconds for retry"
echo
sleep 5
continue
Expand Down
2 changes: 1 addition & 1 deletion scripts/ci/tools/ci_free_space_on_ci.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,5 @@
sudo swapoff -a
sudo rm -f /swapfile
sudo apt clean
docker system prune --all
docker system prune --all --force
df -h

0 comments on commit 213c6a2

Please sign in to comment.