Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TACKLE-311] - REST Pods shut down over time for Tackle instance deployed on Openshift #189

Closed
wants to merge 1 commit into from

Conversation

carlosthe19916
Copy link
Contributor

Resolves: https://issues.redhat.com/browse/TACKLE-311

What is the problem

  • We are facing the same problem explained at Quarkus keeps dead database connections in its connection pool quarkusio/quarkus#15025
  • Basically, the connections pool is unlimitedly growing each time the DB is restarted. At a certain point (approximately after 35 DB restarts) the backend is not able to create a new connection to the DB and hence not being able to connect to the DB even when the DB is available and working correctly.

How to reproduce the problem

  • Clone the repository https://github.com/konveyor/tackle-ui . We won't use the repo itself buy just the docker-compose.yml file in that repository to start a new instance of Tackle without the need to use K8s or OCP.
  • Locate your terminal at the root path of the folder where you downloaded tackle-ui and then execute:
docker-compose up
  • At this point you should have Tackle running at http://localhost:3001/ . Now let's generate the Error: In another terminal execute:
for n in {0..35}; do docker stop controls_db_container_id && sleep 30 && docker start controls_db_container_id && sleep 30 ; done

In the command above, replace controls_db_container_id with the container id of the controls-db container; to obtain the controls_db id execute docker ps. See the image below:

Screenshot from 2021-12-06 13-39-06

  • After the command for stopping and starting the controls-db container finishes, you should be able to see the following log in the container controls:
2021-12-06 12:59:54,271 WARN  [io.agr.pool] (agroal-11) Datasource '<default>': The connection attempt failed.
2021-12-06 12:59:54,278 INFO  [io.sma.health] (vert.x-worker-thread-0) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Database connections health check","status":"DOWN","data":{"default":"Unable to execute the validation check for the default DataSource: The connection attempt failed."}}]}
2021-12-06 13:20:28,414 INFO  [io.sma.health] (vert.x-worker-thread-2) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Database connections health check","status":"DOWN","data":{"default":"Unable to execute the validation check for the default DataSource: This connection has been closed."}}]}
2021-12-06 13:20:43,537 INFO  [io.sma.health] (vert.x-worker-thread-3) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Database connections health check","status":"DOWN","data":{"default":"Unable to execute the validation check for the default DataSource: Sorry, acquisition timeout!"}}]}
2021-12-06 13:20:58,535 INFO  [io.sma.health] (vert.x-worker-thread-4) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Database connections health check","status":"DOWN","data":{"default":"Unable to execute the validation check for the default DataSource: Sorry, acquisition timeout!"}}]}
2021-12-06 13:21:13,551 INFO  [io.sma.health] (vert.x-worker-thread-5) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Database connections health check","status":"DOWN","data":{"default":"Unable to execute the validation check for the default DataSource: Sorry, acquisition timeout!"}}]}
2021-12-06 13:21:28,569 INFO  [io.sma.health] (vert.x-worker-thread-6) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Database connections health check","status":"DOWN","data":{"default":"Unable to execute the validation check for the default DataSource: Sorry, acquisition timeout!"}}]}
2021-12-06 13:21:43,529 INFO  [io.sma.health] (vert.x-worker-thread-7) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Database connections health check","status":"DOWN","data":{"default":"Unable to execute the validation check for the default DataSource: Sorry, acquisition timeout!"}}]}
2021-12-06 13:21:58,525 INFO  [io.sma.health] (vert.x-worker-thread-8) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Database connections health check","status":"DOWN","data":{"default":"Unable to execute the validation check for the default DataSource: Sorry, acquisition timeout!"}}]}
2021-12-06 13:22:13,601 INFO  [io.sma.health] (vert.x-worker-thread-9) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Database connections health check","status":"DOWN","data":{"default":"Unable to execute the validation check for the default DataSource: Sorry, acquisition timeout!"}}]}

The message Sorry, acquisition timeout! is an indicator that no matter the DB is up, the backend is not able to create a connection wi the DB!

The solution

The problem was solved within Quarkus in version 1.13.2.Final (or higher). Upgrading the Agroal version fixed the problem, see the issue quarkusio/quarkus#15025 and its corresponding PR at quarkusio/quarkus#15949

This PR

The current repository can not upgrade to higher versions of Quarkus because some tests will start failing. The easiest path to keep our tests stable is to upgrade manually the version of Agroal embedded within Quarkus.

@codecov
Copy link

codecov bot commented Dec 6, 2021

Codecov Report

Merging #189 (fd7f25b) into main (1dcd1a1) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##                main      #189   +/-   ##
===========================================
  Coverage     100.00%   100.00%           
  Complexity        71        71           
===========================================
  Files             14        14           
  Lines            114       114           
  Branches           8         8           
===========================================
  Hits             114       114           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1dcd1a1...fd7f25b. Read the comment docs.

@carlosthe19916
Copy link
Contributor Author

This PR is no longer needed since the following PRs upgrade Quarkus to use the 2.5.4.Final version, fixing the outdated version of the Agroal library:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant