Skip to content
This repository has been archived by the owner on Oct 23, 2023. It is now read-only.

Bind address already in use #118

Open
farshidtz opened this issue Aug 15, 2022 · 8 comments
Open

Bind address already in use #118

farshidtz opened this issue Aug 15, 2022 · 8 comments
Labels
bug Something isn't working

Comments

@farshidtz
Copy link
Member

Tests occasionally fail on Github because the bind address with default ports isn’t available. It isn’t clear if there is an error in services or if the ports are used by some other processes. We’ve seen this for 59880 (core data), 59882 (core command).

When this happens, the services restart very quickly and then disable. This is because the default restart delay from systemd is 100ms without any increments, unlike docker. See systemd/systemd#6129

The attached logs shows core command restarting 9 times in 7 seconds before being permanently stopped:

Aug 12 14:02:29 fv-az41-871 edgexfoundry.core-command[5432]: level=ERROR ts=2022-08-12T14:02:29.075056424Z app=core-command source=httpserver.go:146 msg="Web server failed: listen tcp 127.0.0.1:59882: bind: address already in use"
...
Aug 12 14:02:30 fv-az41-871 systemd[1]: snap.edgexfoundry.core-command.service: Main process exited, code=exited, status=1/FAILURE
Aug 12 14:02:30 fv-az41-871 systemd[1]: snap.edgexfoundry.core-command.service: Failed with result 'exit-code'.
Aug 12 14:02:30 fv-az41-871 systemd[1]: snap.edgexfoundry.core-command.service: Scheduled restart job, restart counter is at 9.
Aug 12 14:02:30 fv-az41-871 systemd[1]: Stopped Service for snap application edgexfoundry.core-command.
Aug 12 14:02:30 fv-az41-871 systemd[1]: snap.edgexfoundry.core-command.service: Start request repeated too quickly.
Aug 12 14:02:30 fv-az41-871 systemd[1]: snap.edgexfoundry.core-command.service: Failed with result 'exit-code'.
Aug 12 14:02:30 fv-az41-871 systemd[1]: Failed to start Service for snap application edgexfoundry.core-command.

Example test result and logs:

Related issue in tests: #79, where test port 33333 is not available.

@farshidtz farshidtz added the bug Something isn't working label Aug 15, 2022
@farshidtz
Copy link
Member Author

Maybe the ports are temporarily unavailable and we don’t give the chance for them to become available. Adding a restart delay in snap (snapcraft's apps.<app-name>.restart-delay) or to the service (EDGEX_STARTUP_INTERVAL env var) could help.

@farshidtz farshidtz pinned this issue Aug 15, 2022
@farshidtz
Copy link
Member Author

Got the same error locally. The port wasn't used by any process when I checked but it looks like core-command restarted 10 times within 10 seconds before (systemd) giving up.

$ sudo journalctl -n 1000 | grep "59882: bind: address already in use"
Sep 27 18:54:43 farshid-cirrus7 edgexfoundry.core-command[122356]: level=ERROR ts=2022-09-27T16:54:43.442277382Z app=core-command source=httpserver.go:146 msg="Web server failed: listen tcp 127.0.0.1:59882: bind: address already in use"
Sep 27 18:54:44 farshid-cirrus7 edgexfoundry.core-command[122598]: level=ERROR ts=2022-09-27T16:54:44.75672666Z app=core-command source=httpserver.go:146 msg="Web server failed: listen tcp 127.0.0.1:59882: bind: address already in use"
Sep 27 18:54:44 farshid-cirrus7 edgexfoundry.core-command[122646]: level=ERROR ts=2022-09-27T16:54:44.96466177Z app=core-command source=httpserver.go:146 msg="Web server failed: listen tcp 127.0.0.1:59882: bind: address already in use"
Sep 27 18:54:46 farshid-cirrus7 edgexfoundry.core-command[122701]: level=ERROR ts=2022-09-27T16:54:46.217695543Z app=core-command source=httpserver.go:146 msg="Web server failed: listen tcp 127.0.0.1:59882: bind: address already in use"
Sep 27 18:54:47 farshid-cirrus7 edgexfoundry.core-command[122750]: level=ERROR ts=2022-09-27T16:54:47.461987276Z app=core-command source=httpserver.go:146 msg="Web server failed: listen tcp 127.0.0.1:59882: bind: address already in use"
Sep 27 18:54:48 farshid-cirrus7 edgexfoundry.core-command[122799]: level=ERROR ts=2022-09-27T16:54:48.747142862Z app=core-command source=httpserver.go:146 msg="Web server failed: listen tcp 127.0.0.1:59882: bind: address already in use"
Sep 27 18:54:49 farshid-cirrus7 edgexfoundry.core-command[122852]: level=ERROR ts=2022-09-27T16:54:49.951141356Z app=core-command source=httpserver.go:146 msg="Web server failed: listen tcp 127.0.0.1:59882: bind: address already in use"
Sep 27 18:54:51 farshid-cirrus7 edgexfoundry.core-command[122904]: level=ERROR ts=2022-09-27T16:54:51.224830663Z app=core-command source=httpserver.go:146 msg="Web server failed: listen tcp 127.0.0.1:59882: bind: address already in use"
Sep 27 18:54:52 farshid-cirrus7 edgexfoundry.core-command[122952]: level=ERROR ts=2022-09-27T16:54:52.487154361Z app=core-command source=httpserver.go:146 msg="Web server failed: listen tcp 127.0.0.1:59882: bind: address already in use"
Sep 27 18:54:53 farshid-cirrus7 edgexfoundry.core-command[123001]: level=ERROR ts=2022-09-27T16:54:53.748887177Z app=core-command source=httpserver.go:146 msg="Web server failed: listen tcp 127.0.0.1:59882: bind: address already in use"

@MonicaisHer
Copy link
Contributor

MonicaisHer commented Oct 4, 2022

It happens again in GitHub action test. core-data port 59880 is busy. There is no restart attempt:

Oct 04 05:56:26 fv-az47-924 edgexfoundry.core-data[5145]: level=ERROR ts=2022-10-04T05:56:26.372629665Z app=core-data source=httpserver.go:147 msg="Web server failed: listen tcp 127.0.0.1:59880: bind: address already in use"

@farshidtz
Copy link
Member Author

Another solution is to start the failed services manually, while waiting 180 seconds for the them to become available.

@MonicaisHer
Copy link
Contributor

Port 44444 is occasionally occupied as well:

app=device-virtual source=httpserver.go:147 msg="Web server failed: listen tcp 127.0.0.1:44444: bind: address already in use"

logs: https://github.com/canonical/edgex-snap-testing/actions/runs/4946211873/jobs/8843978702?pr=204#step:5:214

@MonicaisHer
Copy link
Contributor

Port 44444 is not available: #207 (comment)

@farshidtz
Copy link
Member Author

We may be able to find the root cause by adding a check before starting services to:

  • verify that all expected ports are available
  • if not, find the process using that port and print the application name

@farshidtz
Copy link
Member Author

farshidtz commented May 26, 2023

Since the services are now all disabled by default, we have the chance to set EDGEX_STARTUP_INTERVAL before starting them for the first time, to slow down the restart intervals suggested in #118 (comment). The downside of doing that is the reliance on a feature which gets tested only later on as part of the same testing suite.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants