Skip to content
This repository has been archived by the owner on Mar 31, 2023. It is now read-only.

ACA Segment Faulting when started after busybox container is started #229

Open
kiran1048 opened this issue Mar 30, 2021 · 3 comments
Open
Assignees

Comments

@kiran1048
Copy link
Contributor

On a compute node, start a busy box container and assign a IP/MAC to the container instance through:
docker run -itd --name --net=none busybox sh
ovs-docker add-port br-int eth1 --ipaddress=

--macaddress=

This creates a bridge br-int. Thereafter when you start the ACA on the same compute node, we see ACA crashing with segmentation fault as shown below:
$ ./build/bin/AlcorControlAgent -d
ACA_OVS_L2_Programmer::setup_ovs_bridges_if_need ---> Entering
ACA_OVS_L2_Programmer::execute_ovsdb_command ---> Entering
Executing command: ovs-vsctl br-exists br-int
Trying to init a new sub to connect to the NCM
After initing a new sub to connect to the NCM
Streaming capable GRPC server listening on 0.0.0.0:50001
Command succeeded!
Elapsed time for system command took: 4480 microseconds or 4 milliseconds.
Elapsed time for ovsdb client call took: 4536 microseconds or 4 milliseconds. rc: 0
ACA_OVS_L2_Programmer::execute_ovsdb_command <--- Exiting, rc = 0
ACA_OVS_L2_Programmer::execute_ovsdb_command ---> Entering
Executing command: ovs-vsctl br-exists br-tun
Command failed!!! rc: 512
Elapsed time for system command took: 4017 microseconds or 4 milliseconds.
Elapsed time for ovsdb client call took: 4074 microseconds or 4 milliseconds. rc: 512
ACA_OVS_L2_Programmer::execute_ovsdb_command <--- Exiting, rc = 512
Invalid environment br-int=1 and br-tun=0, cannot proceed
ACA_OVS_L2_Programmer::setup_ovs_bridges_if_need <--- Exiting, overall_rc = 1
Segmentation fault (core dumped)
(root:FW0009098):/root/pingtest/alcor-control-agent [master]

@er1cthe0ne
Copy link
Contributor

It will be a good idea to fix the Segmentation fault, using gdb to pinpoint the code issue.

The actual problem is showed in the log "Invalid environment br-int=1 and br-tun=0, cannot proceed", br-int is created by the ovs-docker command. It created a situation where br-int exist but br-tun doesn't exist. ACA doesn't know how to proceed on this weird environment.

@zzxgzgz
Copy link
Contributor

zzxgzgz commented Mar 30, 2021

@er1cthe0ne

The issue is caused here:
https://github.com/futurewei-cloud/alcor-control-agent/blob/master/src/aca_main.cpp#L221

In some of our test senarios, we might start some busybox containers and use ovs-docker add port ... command to add a port for the container, which causes the creation of the br-int(br-tun remains non-existent).

When we call the aca_ovs_l2_programmer::ACA_OVS_L2_Programmer::get_instance().setup_ovs_bridges_if_need(); function, it finds out that br-int is here but br-tun is not, and it is doing nothing but printing out a line of log of

Invalid environment br-int=%d and br-tun=%d, cannot proceed

In the following lines, ACA is trying to monitor the non-existent br-tun, which causes the seg fault.

If the main function returns here, the segmentation should be avoided.

@kiran1048
Copy link
Contributor Author

As @zzxgzgz suggested, when a check is made, we are able to prevent a crash:

$ ./build/bin/AlcorControlAgent -d
ACA_OVS_L2_Programmer::setup_ovs_bridges_if_need ---> Entering
ACA_OVS_L2_Programmer::execute_ovsdb_command ---> Entering
Executing command: ovs-vsctl br-exists br-int
Trying to init a new sub to connect to the NCM
After initing a new sub to connect to the NCM
Streaming capable GRPC server listening on 0.0.0.0:50001
Command succeeded!
Elapsed time for system command took: 4449 microseconds or 4 milliseconds.
Elapsed time for ovsdb client call took: 4503 microseconds or 4 milliseconds. rc: 0
ACA_OVS_L2_Programmer::execute_ovsdb_command <--- Exiting, rc = 0
ACA_OVS_L2_Programmer::execute_ovsdb_command ---> Entering
Executing command: ovs-vsctl br-exists br-tun
Command failed!!! rc: 512
Elapsed time for system command took: 3980 microseconds or 3 milliseconds.
Elapsed time for ovsdb client call took: 4039 microseconds or 4 milliseconds. rc: 512
ACA_OVS_L2_Programmer::execute_ovsdb_command <--- Exiting, rc = 512
Invalid environment br-int=1 and br-tun=0, cannot proceed
ACA_OVS_L2_Programmer::setup_ovs_bridges_if_need <--- Exiting, overall_rc = 1
ACA is not able to create the bridges, please check your environment

@zzxgzgz zzxgzgz linked a pull request Apr 5, 2021 that will close this issue
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants