Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manager crashes in tests with Caught Error #180

Closed
couryrr-afs opened this issue Apr 2, 2024 · 10 comments
Closed

Manager crashes in tests with Caught Error #180

couryrr-afs opened this issue Apr 2, 2024 · 10 comments

Comments

@couryrr-afs
Copy link

Describe the bug

When running test in a docker container the manager starts up but dies almost immediately with the following:

INFO     root:startup_tests.py:83 >>>>>>>>> test_001_start_test_module <<<<<<<<<
INFO     root:everest_core.py:164 config: /tmp/pytest-of-root/pytest-1/test_001_start_test_module0/everest_config/everest_config.yaml
INFO     root:everest_core.py:241 temp everest user-config: /tmp/pytest-of-root/pytest-1/test_001_start_test_module0/everest_config/user-config/everest_config.yaml
INFO     root:everest_core.py:244 Adding test control module(s) to user-config: None
INFO     root:everest_core.py:171 /tmp/pytest-of-root/pytest-1/test_001_start_test_module0/status.fifo
INFO     root:everest_core.py:177 Standalone module probe was specified
INFO     root:everest_core.py:183 /ext/source/build/dist/bin/manager --config /tmp/pytest-of-root/pytest-1/test_001_start_test_module0/everest_config/everest_config.yaml --status-fifo /tmp/pytest-of-root/pytest-1/test_001_start_test_module0/status.fifo --standalone probe
INFO     root:everest_core.py:185 Starting EVerest...
INFO     root:everest_core.py:186 /ext/source/build/dist/bin/manager  --config  /tmp/pytest-of-root/pytest-1/test_001_start_test_module0/everest_config/everest_config.yaml  --status-fifo  /tmp/pytest-of-root/pytest-1/test_001_start_test_module0/status.fifo  --standalone  probe
DEBUG    root:everest_core.py:208   2024-04-01 21:30:51.425784 [INFO] manager          ::   ________      __                _
DEBUG    root:everest_core.py:208   2024-04-01 21:30:51.430796 [INFO] manager          ::  |  ____\ \    / /               | |
DEBUG    root:everest_core.py:208   2024-04-01 21:30:51.430848 [INFO] manager          ::  | |__   \ \  / /__ _ __ ___  ___| |_
DEBUG    root:everest_core.py:208   2024-04-01 21:30:51.431166 [INFO] manager          ::  |  __|   \ \/ / _ \ '__/ _ \/ __| __|
DEBUG    root:everest_core.py:208   2024-04-01 21:30:51.431185 [INFO] manager          ::  | |____   \  /  __/ | |  __/\__ \ |_
DEBUG    root:everest_core.py:208   2024-04-01 21:30:51.431196 [INFO] manager          ::  |______|   \/ \___|_|  \___||___/\__|
DEBUG    root:everest_core.py:208   2024-04-01 21:30:51.431206 [INFO] manager          ::
DEBUG    root:everest_core.py:208   2024-04-01 21:30:51.431234 [INFO] manager          :: Using MQTT broker mqtt-server:1883
DEBUG    root:everest_core.py:208   2024-04-01 21:30:51.445786 [ERRO] manager         int main(int, char**) :: Main manager process exits because of caught exception:
DEBUG    root:everest_core.py:208   Syscall pipe2() failed (Invalid argument), exiting
DEBUG    root:everest_core.py:208
WARNING  root:everest_core.py:215 EVerest stopped with return code: 1
DEBUG    root:everest_core.py:217 EVerest output stopped
---------------------------------------------- Captured log teardown -----------------------------------------------
DEBUG    asyncio:selector_events.py:54 Using selector: EpollSelector
DEBUG    root:everest_core.py:222 CONTROLLER stop() function called..

The issue being [ERRO] manager int main(int, char**) :: Main manager process exits because of caught exception: DEBUG root:everest_core.py:208 Syscall pipe2() failed (Invalid argument), exiting I have attempted to get additional information from the logs based on this zulip thread with no success.

EVerest Domain

Testing

Affected EVerest Module

No response

To Reproduce

Using the branch here: https://github.com/US-JOET/everest-demo/tree/couryrr/updating-dockerfile-for-testing

Ensure that a mqtt server is running

Build the docker image locally with:

docker build -t ghcr.io/everest/everest-demo/manager:test .

Attach with
docker run -it --network infranet_network -e MQTT_SERVER_ADDRESS=mqtt-server --entrypoint bash ghcr.io/everest/everest-demo/manager:test

Navigate to
cd /ext/source/tests

Run
pytest --everest-prefix ../build/dist core_tests/startup_tests.py

Anything else?

No response

@hikinggrass hikinggrass transferred this issue from EVerest/everest-core Apr 11, 2024
@hikinggrass
Copy link
Contributor

That seems to be caused by an issue with the pipe2 flag O_DIRECT which might not be available in WSL or macOS, which system are you running your docker container on?

@couryrr-afs
Copy link
Author

That seems to be caused by an issue with the pipe2 flag O_DIRECT which might not be available in WSL or macOS, which system are you running your docker container on?

Currently the host is a macOS. We can try this on a Linux distribution.

@hikinggrass

@couryrr-afs
Copy link
Author

@hikinggrass

Currently, I have not been able to run the demo on a proper linux machine. The VM I have with 20 gb or ram and 100 gb storage crashes when building. I am trying to find an alternative option.

However, I am not convinced that the host is the issue. I am able to run the devcontainer in everest-utils playground without issue on my mac host. The difference between this and the demo is that the image the demo uses is the alpine build-kit image here. I have done some research around O_DIRECT in Alpine but cannot find anything conclusive.

I have attempted to move the demo over to the debian build-kit version here. However, when building the recent release I am getting a version issue with ev-cli:

253.9 CMake Error at cmake/ev-cli.cmake:21 (message):
253.9   ev-cli version 0.0.24 or higher is required.  However your ev-cli version
253.9   is '0.0.22'.  Please upgrade ev-cli.
253.9 Call Stack (most recent call first):
253.9   cmake/ev-project-bootstrap.cmake:8 (require_ev_cli_version)
253.9   CMakeLists.txt:103 (include)
253.9 
253.9 
254.0 -- Configuring incomplete, errors occurred!
254.0 See also "/ext/source/build/CMakeFiles/CMakeOutput.log".
254.0 See also "/ext/source/build/CMakeFiles/CMakeError.log".
------

@andistorm
Copy link
Contributor

@couryrr-afs The build-kit image needs to be triggered to build manually at the moment, which is probably not done for the debian one a long time. I just triggered (https://github.com/EVerest/everest-ci/actions/runs/8664966499) this build for debian, should solve your ev-cli issue

@couryrr-afs couryrr-afs changed the title Manager crashes in tests with Cuaght Error Manager crashes in tests with Caught Error Apr 12, 2024
@couryrr-afs
Copy link
Author

@andistorm thank you for the update. That did get me past the issue I was experiencing.
@hikinggrass after some hassle I was able to do a run on a proper linux box and it did appear to run fine. This might be something worth discussing. This limitation was not present previously and might have some impact on users if docker is not a viable use case. Do you happen to know when the change that required the O_DIRECT flag was introduced?

@hikinggrass
Copy link
Contributor

@andistorm thank you for the update. That did get me past the issue I was experiencing. @hikinggrass after some hassle I was able to do a run on a proper linux box and it did appear to run fine. This might be something worth discussing. This limitation was not present previously and might have some impact on users if docker is not a viable use case. Do you happen to know when the change that required the O_DIRECT flag was introduced?

Looks like this was introduced ~2 years ago on 2022-04-20, so it's likely that the issue manifests itself now because of a macos/docker/etc. change. However we can try to investigate if we can work around this for affected platforms

@shankari
Copy link

shankari commented Apr 19, 2024

@couryrr-afs

after some hassle I was able to do a run on a proper linux box and it did appear to run fine

By a "proper linux box", I assume you mean a physical server booting ubuntu. But which version of ubuntu (Debian/Alpine) did you try with?

  1. Docker on Mac runs in a VM (VirtualBox)
  2. O_DIRECT is implemented in libc; it is not a kernel device module

Given that the primary difference between Alpine and Ubuntu/Debian/RHEL etc is in the libc implementation, I would expect that the crash is primarily due to the use of alpine in the demo container.

this build for debian, should solve your ev-cli issue
thank you for the update. That did get me past the issue I was experiencing.

when you tried the build in debian, did it fail as well?
And again, did the "proper linux box" use debian or alpine? If debian, have you tried it with alpine?

@couryrr-afs
Copy link
Author

@shankari the linux machine that was used was a desktop version PopOS. Under the hood that is Ubuntu 22.04. The version of linux in the docker machine was Alpine. I agree with your points. I was under the impression that Alpine was the culprit which is why I moved to trying to build the Debian container.

While troubleshooting issues with the Debian build a teammate and I had successful runs on Alpine. So while I confirmed moving past the error I did not get to the point of a full build and run in the Debian docker container. All of the Debian work was being done on my Mac to verify what you highlighted in point 1 and 2.

For the current state of the demo what error(s) are you seeing? My understanding is that you are not able to see a successful run still.

@shankari
Copy link

shankari commented Apr 25, 2024

@couryrr-afs I do not plan to run the demo to test this out until I know the exact scenario in which it passes. I don't want to spend my time working through various configurations.

Can you please list out the known working configuration (host OS/VM OS/docker OS...) so that I can start there and try to go from working to working?

@couryrr-afs
Copy link
Author

For the docker demo related issue please see EVerest/everest-demo#30. Currently, the issue described here was resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants