Skip to content
This repository has been archived by the owner on Mar 31, 2023. It is now read-only.

Implement DHCP server support #102

Closed
er1cthe0ne opened this issue Apr 16, 2020 · 55 comments
Closed

Implement DHCP server support #102

er1cthe0ne opened this issue Apr 16, 2020 · 55 comments
Assignees
Labels
Feature good first issue Good for newcomers

Comments

@er1cthe0ne
Copy link
Contributor

er1cthe0ne commented Apr 16, 2020

Success Criteria:

Agent support DHCP programming and allow VMs/Containers to receive the assigned IP address through DHCP.

Details:

We need to implement the DHCP support in OpenStack environment, taking over the responsibility of neutron DHCP agent. This tasks includes:

  1. Done: Close down on the current DHCP design draft document [Documentation] Integration with DHCP #96 - https://github.com/futurewei-cloud/alcor-control-agent/blob/master/docs/dhcp_programming.adoc
  2. Done: Modify the current Alcor network state message to support DHCP programming - https://github.com/futurewei-cloud/alcor/blob/master/schema/proto3/dhcp.proto
  3. Done: Implement the DHCP programming interface according to the design - https://github.com/futurewei-cloud/alcor-control-agent/blob/master/include/aca_dhcp_programming_if.h
  4. In progress: Implement DHCP handler class to work with openflow and act as DHCP server, remaining items:
    4a. We need to add option flow rule to capture DHCP packets and send to openflow controller (ACA), we can add that into the DHCP class init function, called by Aca_Goal_State_Handler::Aca_Goal_State_Handler() constructor
    -the add rule should look like: add-flow br-int "table=0,priority=25,udp,udp_src=68,udp_dst=67,actions=CONTROLLER"
    -the delete rule should look like: del-flows br-int udp,udp_src=68,udp_dst=67
    -to program the openflow rules, use: ACA_OVS_L2_Programmer::get_instance().execute_openflow_command
    4b. When aca_ovs_control code received a DHCP packet, it needs to call DHCP function to parse and process it, please provide the interface to call and @cj-chung can tell you where to change the code to call it.
  5. Unit testing on DHCP functionality
    5a. Please see DISABLED_2_ports_ROUTING_test_traffic_one_machine in https://github.com/futurewei-cloud/alcor-control-agent/blob/master/test/gtest/aca_tests.cpp for an example on how we used docker + ovs-docker on physical machine or VM to create container for testing. We can create container and assigned a mac address to it, let it do DHCP to test our DHCP implementation. See https://goldmann.pl/blog/2014/01/30/assigning-ip-addresses-to-docker-containers-via-dhcp/
  6. End to End test plan and scenario testing
  7. Scale and perf analysis
@er1cthe0ne er1cthe0ne self-assigned this Apr 16, 2020
@er1cthe0ne er1cthe0ne removed their assignment Apr 23, 2020
@xieus
Copy link
Contributor

xieus commented May 21, 2020

@xieus
Copy link
Contributor

xieus commented May 21, 2020

Item 2 is under review: futurewei-cloud/alcor#193

@xieus xieus added this to the Version 0.5.2020.05.30 milestone May 21, 2020
@xieus xieus added the good first issue Good for newcomers label May 21, 2020
@w2520n2520
Copy link
Contributor

I'm interesting in this issue, and I have some experence in network stack developing. May this issue assigned to me? Thanks.

@xieus
Copy link
Contributor

xieus commented May 22, 2020

@w2520n2520 Absolutely, and thank you! This issue has been assigned.

@xieus
Copy link
Contributor

xieus commented May 22, 2020

Update to Item 2: PR futurewei-cloud/alcor#193 has been merged to alcor/master.

@w2520n2520
Copy link
Contributor

Hi Liguang and Eric,
I didn't find the packet parsing if in the interface class and neither from the arch diagram. So whether this dhcp server need to handle scenarios for incoming packet from network? Naive question. Thanks.
@xieus @er1cthe0ne

@er1cthe0ne
Copy link
Contributor Author

Hi Liguang and Eric,
I didn't find the packet parsing if in the interface class and neither from the arch diagram. So whether this dhcp server need to handle scenarios for incoming packet from network? Naive question. Thanks.
@xieus @er1cthe0ne

Hi @w2520n2520 - you asked the right question and on the right track. This dhcp server needs to intercept the dhcp packets using openflow rules, parse it and reply with DHCP_OFFER and later DHCP_ACK message. More information is available in the reference session in the design doc: https://github.com/futurewei-cloud/alcor-control-agent/blob/master/docs/dhcp_programming.adoc

@w2520n2520
Copy link
Contributor

Running alcor-control-agent and tests
You can run the test (optional):
root@ca62b6feec63:/mnt/host/code/alcor-control-agent# ./build/tests/aca_tests

When building you may encounter for "libgtest.so can't open or doesn't exist" issue, please refer https://blog.csdn.net/bocksong/article/details/93207753 to resolve.

@er1cthe0ne
Copy link
Contributor Author

Running alcor-control-agent and tests
You can run the test (optional):
root@ca62b6feec63:/mnt/host/code/alcor-control-agent# ./build/tests/aca_tests

When building you may encounter for "libgtest.so can't open or doesn't exist" issue, please refer https://blog.csdn.net/bocksong/article/details/93207753 to resolve.

Did you encounter this issue of "libgtest.so can't open or doesn't exist"? The intent is to run aca_tests inside the build container which has all the dependency setup already.

@w2520n2520
Copy link
Contributor

Followed build and execution

Running alcor-control-agent and tests
You can run the test (optional):
root@ca62b6feec63:/mnt/host/code/alcor-control-agent# ./build/tests/aca_tests

When building you may encounter for "libgtest.so can't open or doesn't exist" issue, please refer https://blog.csdn.net/bocksong/article/details/93207753 to resolve.

Did you encounter this issue of "libgtest.so can't open or doesn't exist"? The intent is to run aca_tests inside the build container which has all the dependency setup already.

Well, build and test should be executed in the generated docker "a1", my misunderstanding.
I can got 18 tests passed but still fail to run the bin.

@er1cthe0ne
Copy link
Contributor Author

Followed build and execution

Running alcor-control-agent and tests
You can run the test (optional):
root@ca62b6feec63:/mnt/host/code/alcor-control-agent# ./build/tests/aca_tests

When building you may encounter for "libgtest.so can't open or doesn't exist" issue, please refer https://blog.csdn.net/bocksong/article/details/93207753 to resolve.

Did you encounter this issue of "libgtest.so can't open or doesn't exist"? The intent is to run aca_tests inside the build container which has all the dependency setup already.

Well, build and test should be executed in the generated docker "a1", my misunderstanding.
I can got 18 tests passed but still fail to run the bin.

Seeing 18 tests passed on the unit/functional test is good enough for now. What kind of error do you see when you run ./build/bin/AlcorControlAgent? It will try to connect to kafka so those error maybe expected if kafka was not setup.

The next step on DHCP implementation is to develop a standalone DHCP application based on this design. We can do the integration to AlcorControlAgent later.

@w2520n2520
Copy link
Contributor

Thanks

Followed build and execution

Running alcor-control-agent and tests
You can run the test (optional):
root@ca62b6feec63:/mnt/host/code/alcor-control-agent# ./build/tests/aca_tests

When building you may encounter for "libgtest.so can't open or doesn't exist" issue, please refer https://blog.csdn.net/bocksong/article/details/93207753 to resolve.

Did you encounter this issue of "libgtest.so can't open or doesn't exist"? The intent is to run aca_tests inside the build container which has all the dependency setup already.

Well, build and test should be executed in the generated docker "a1", my misunderstanding.
I can got 18 tests passed but still fail to run the bin.

Seeing 18 tests passed on the unit/functional test is good enough for now. What kind of error do you see when you run ./build/bin/AlcorControlAgent? It will try to connect to kafka so those error maybe expected if kafka was not setup.

The next step on DHCP implementation is to develop a standalone DHCP application based on this design. We can do the integration to AlcorControlAgent later.

Thanks Eric. Just trying to build up my working ground here.

@w2520n2520
Copy link
Contributor

w2520n2520 commented Jun 2, 2020

Hi Eric,
One question below:
int Aca_Comm_Manager::update_goal_state()
{
update_vpc_states();
update_subnet_states();
update_port_states();
update_dhcp_states(); //to be
}

So these resources will always be updates together? Any chance they can be updated independently? Thanks. @er1cthe0ne

@er1cthe0ne
Copy link
Contributor Author

Hi Eric,
One question below:
int Aca_Comm_Manager::update_goal_state() { update_vpc_states(); update_subnet_states(); update_port_states(); update_dhcp_states(); //to be }

So these resources will always be updates together? Any chance they can be updated independently? Thanks. @er1cthe0ne

Hi Nan Wu,

Good question, the GoalState message contains:

  • 0 to N vpc_states
  • 0 to N subnet_states
  • 0 to N port_states
  • 0 to N security_group_states
  • 0 to N dhcp_states

Aca_Comm_Manager will try to update the whole GoalState in an efficient manner.

For DHCP create, the likely GoalState message would look like:

  • 1 port_states, OperationType::CREATE - create/configure a new port
  • 1 dhcp_states, OperationType::CREATE - create the DHCP info for the new port

Or DHCP update, it could look like:

  • 1 dhcp_states, OperationType::UPDATE - update the DHCP info for a port

Does it make sense? Let me know if you have other questions. @w2520n2520

@er1cthe0ne
Copy link
Contributor Author

The next step on DHCP implementation is to develop a standalone DHCP application based on this design. We can do the integration to AlcorControlAgent later.

Hi Nan Wu,

Do you think you can have the standalone DHCP application available in a few weeks? It would be great if we can complete the integration into AlcorControlAgent by the month of June. @w2520n2520

@er1cthe0ne
Copy link
Contributor Author

The next step on DHCP implementation is to develop a standalone DHCP application based on this design. We can do the integration to AlcorControlAgent later.

Hi Nan Wu,

Do you think you can have the standalone DHCP application available in a few weeks? It would be great if we can complete the integration into AlcorControlAgent by the month of June. @w2520n2520

Hi Nan Wu,

Checking in here. Do you think we can meet the target of June to have a standalone DHCP application based on this design and integrate it with AlcorControlAgent? Let me know. @w2520n2520

@w2520n2520
Copy link
Contributor

Hi Eric,
I'm trying to meet that goal. I'll keep updating you.

@er1cthe0ne
Copy link
Contributor Author

Hi Nan Wu,

Checking in here and see if there is anything I can help. Maybe we can breakdown the standalone DHCP application task into smaller pieces? e.g.:

  1. basic framework on the application, command line parsing but doesn't need to be fancy.
  2. Implement DHCP handler class inherit from Dhcp_Programming_Interface in https://github.com/futurewei-cloud/alcor-control-agent/blob/master/include/aca_dhcp_programming_if.h
  3. program the openflow rule to route DHCP packets into and out of the DHCP application
  4. parsing of the input parameter (comes from goalstate message) to DHCP handler class
  5. determine the needed DHCP actions within the DHCP application
  6. unit test infrastructure and test cases

How does it sound? @w2520n2520

@w2520n2520
Copy link
Contributor

w2520n2520 commented Jun 9, 2020

Hi Nan Wu,

Checking in here and see if there is anything I can help. Maybe we can breakdown the standalone DHCP application task into smaller pieces? e.g.:

  1. basic framework on the application, command line parsing but doesn't need to be fancy.
  2. Implement DHCP handler class inherit from Dhcp_Programming_Interface in https://github.com/futurewei-cloud/alcor-control-agent/blob/master/include/aca_dhcp_programming_if.h
  3. program the openflow rule to route DHCP packets into and out of the DHCP application
  4. parsing of the input parameter (comes from goalstate message) to DHCP handler class
  5. determine the needed DHCP actions within the DHCP application
  6. unit test infrastructure and test cases

How does it sound? @w2520n2520

Hi Eric,
Actually I've done about 4th item, dhcp handler part. I'm working on the 2nd and 3rd items. But i have doubts on them.
Per my understanding, here is the code flow for state msg: consumer->comm_mgr-->update_goal-->dhcp_state_handler(newly_added)-->dhcp_prog_if--------??--------->dhcp_server

Q1: How should i put dhcp_server? Should it be in a independent thread or run in the same one with aca_main?(maybe not a good idea). About the "??" part, net_handler use rpc to talk to transit_daemon of mizar, but dhcp_server is supposed to be on the same node, so rpc may be not necessary here, but again network dhcp-server will be on different node, the same comm way will benefit. I have limited understanding about alcor-agent's whole design behind, I may need your involvement here.

Q2: How is like the code flow for 3rd item? Didn't find the if for packet_in under current src dir.

Thanks for your guidance and help.
@er1cthe0ne

@er1cthe0ne
Copy link
Contributor Author

Hi Nan Wu,

Thanks for the questions, I will answer it one by one. Do let me know if you have other questions.

Should it be in a independent thread
Great question, it should be independent thread spin up by aca_main. We will implement it during integration with DHCP standalone app into ACA.

About the "??"
After integration, the DHCP code will be part of ACA running in another thread, so no RPC is needed. You can check out https://github.com/futurewei-cloud/alcor-control-agent/blob/164a8a7cbad1f3b46c0d0592d11df875f192326d/include/aca_dataplane_ovs.h as an example to how to consume an ACA programming interface.

network dhcp-server will be on different node
It will by driven by ACA running on that node in the future, so same communication flow from Alcor controller which sends down goal state message to ACA.

How is like the code flow for 3rd item?
Can you tell me which specific code flow? I want to give you the right information. Are you talking about the openflow rule programming, or how to provide the right DHCP response back to the VM?

Thanks,
Eric

@w2520n2520
Copy link
Contributor

w2520n2520 commented Jun 10, 2020

Hi Eric,

Thanks for the reply.
Still have further questions, may need more your time, trying to understand the design here. :)

Should it be in a independent thread
Great question, it should be independent thread spin up by aca_main. We will implement it during integration with DHCP standalone app into ACA.

[Nan]: OK. I thought i was supposed to start from here. We can do it later.

network dhcp-server will be on different node
It will by driven by ACA running on that node in the future, so same communication flow from Alcor controller which sends down goal state message to ACA.

[Nan]: No, I mean the packet_in flow here instead of the control message flow(goal state). In the dhcp design doc, it mentioned openflow table rules will be used to transfer dhcp packets to dhcp-server. The question is if the dataplane is mizar, there will be no openflow tables right? Another one is, if openflow table is used, there will be two flows--one for local dhcp-server, the other is for network-dhcp-server with low priority. When the local one fails, so should its corresponding flow, so packet will be transfer to the network one.
Is this understanding correct? Still confused about the packet_in_handler flow here.

How is like the code flow for 3rd item?
Can you tell me which specific code flow? I want to give you the right information. Are you talking about the openflow rule programming, or how to provide the right DHCP response back to the VM?

[Nan]: Yes, about the openflow rule programming part.
@er1cthe0ne

@er1cthe0ne
Copy link
Contributor Author

er1cthe0ne commented Jun 10, 2020

Hi Nan Wu,

Still have further questions, may need more your time, trying to understand the design here. :)

No problem, feel free to ask :)

Another one is, if openflow table is used, there will be two flows--one for local dhcp-server, the other is for network-dhcp-server with low priority. When the local one fails, so should its corresponding flow, so packet will be transfer to the network one.

The current focus is OVS dataplane, and the current design only support one dataplane per host.

The backup network-dhcp-server is used when local ACA is down, and it didn't have a chance to setup the local-dhcp-server flow. In the event if ACA exit gracefully, it should remove the local-dhcp-server flow. If ACA exit unexpectedly, it will try to restart a few times and if ACA really cannot get back to running state. Alcor controller would detect it and perform corrective actions.

In summary, I am not sure how both local-dhcp-server and network-dhcp-server flow works at the same time since one of them will be used based on priority. Unless we set a timeout on local-dhcp-server flow but then ACA will need to keep renewing it.

Still confused about the packet_in_handler flow here.

Did I answer your question above? Let me know.

[Nan]: Yes, about the openflow rule programming part.

Ok, please go ahead and execute system call for now (see execute_system_command). ACA will be adding better openflow client support in the future (per current design) and then DHCP code can leverage that when ready.

Hope all of them make sense to you.

BTW, once you have some code implemented, it will be great to send a PR so that we can look at and discuss if needed. @w2520n2520

@er1cthe0ne
Copy link
Contributor Author

er1cthe0ne commented Jun 16, 2020

More information on packet_in_handler flow. In order to have DHCP packets send to ACA, we will need to implement an openflow controller, and have an openflow rule send the matched DHCP packets to openflow controller, that's ACA in our case.

We may use something similar to ovs-ofctl implementation, which acks as an openflow controller. Below is an experiment to show that it should work:

root@fw0016589: ping -I 192.168.0.131 -c1 192.168.0.124
PING 192.168.0.124 (192.168.0.124) from 192.168.0.131 : 56(84) bytes of data.
64 bytes from 192.168.0.124: icmp_seq=1 ttl=64 time=0.348 ms

Br-int is letting all the traffic go now:

root@fw0016589: ovs-ofctl dump-flows br-int
cookie=0x0, duration=699.025s, table=0, n_packets=140, n_bytes=15059, priority=0 actions=NORMAL

Adding new openflow rule to send all packet to CONTROLLER, that’s ovs-ofctl for this case:
root@fw0016589: ovs-ofctl add-flow br-int "table=0, priority=100, actions=CONTROLLER"
root@fw0016589: ovs-ofctl dump-flows br-int
cookie=0x0, duration=786.163s, table=0, n_packets=140, n_bytes=15059, priority=0 actions=NORMAL
cookie=0x0, duration=4.482s, table=0, n_packets=0, n_bytes=0, priority=100 actions=CONTROLLER:65535

Ping doesn’t work anymore because the packets has been sent to CONTROLLER!
root@fw0016589: ping -I 192.168.0.131 -c1 192.168.0.124
PING 192.168.0.124 (192.168.0.124) from 192.168.0.131 : 56(84) bytes of data.

--- 192.168.0.124 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

Printed out by ovs-ofctl!
root@fw0016589: ovs-ofctl monitor br-int 1
NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=98 in_port=int0 (via action) data_len=98 (unbuffered)
icmp,vlan_tci=0x0000,dl_src=ee:c3:0f:ee:c3:46,dl_dst=36:f2:97:d5:3a:b9,nw_src=192.168.0.131,nw_dst=192.168.0.124,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0 icmp_csum:947d
NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 in_port=int0 (via action) data_len=42 (unbuffered)
arp,vlan_tci=0x0000,dl_src=ee:c3:0f:ee:c3:46,dl_dst=36:f2:97:d5:3a:b9,arp_spa=192.168.0.131,arp_tpa=192.168.0.124,arp_op=1,arp_sha=ee:c3:0f:ee:c3:46,arp_tha=00:00:00:00:00:00
NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 in_port=int0 (via action) data_len=42 (unbuffered)
arp,vlan_tci=0x0000,dl_src=ee:c3:0f:ee:c3:46,dl_dst=36:f2:97:d5:3a:b9,arp_spa=192.168.0.131,arp_tpa=192.168.0.124,arp_op=1,arp_sha=ee:c3:0f:ee:c3:46,arp_tha=00:00:00:00:00:00
NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 in_port=int0 (via action) data_len=42 (unbuffered)
arp,vlan_tci=0x0000,dl_src=ee:c3:0f:ee:c3:46,dl_dst=36:f2:97:d5:3a:b9,arp_spa=192.168.0.131,arp_tpa=192.168.0.124,arp_op=1,arp_sha=ee:c3:0f:ee:c3:46,arp_tha=00:00:00:00:00:00
OFPT_ECHO_REQUEST (xid=0x0): 0 bytes of payload

The flow rules shows that the packets is going to CONTROLLER:
root@fw0016589: ovs-ofctl dump-flows br-int
cookie=0x0, duration=979.012s, table=0, n_packets=140, n_bytes=15059, priority=0 actions=NORMAL
cookie=0x0, duration=197.331s, table=0, n_packets=8, n_bytes=420, priority=100 actions=CONTROLLER:65535

@w2520n2520 - let me know if you have question on the approach or have a better suggestion.

@w2520n2520
Copy link
Contributor

@w2520n2520 Did you see these error messages in your local environment when you compile it? If you have latest aca build in your local, it shouldn't have any aca_ovs_control function calls in /tests/gtests/aca_tests.cpp.
If you cannot bypass it, you can just add those global variables in /tests/gtests/aca_tests.cpp like:

string g_ofctl_command = EMPTY_STRING;
string g_ofctl_target = EMPTY_STRING;
string g_ofctl_options = EMPTY_STRING;

I think this is the reason:

If you get linker errors about undefined references to symbols that involve types in the std::__cxx11 namespace or the tag [abi:cxx11] then it probably indicates that you are trying to link together object files that were compiled with different values for the _GLIBCXX_USE_CXX11_ABI macro. This commonly happens when linking to a third-party library that was compiled with an older version of GCC. If the third-party library cannot be rebuilt with the new ABI then you will need to recompile your code with the old ABI.
https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html

Solving:
https://stackoverflow.com/questions/55406770/gcc-undefined-references-with-abicxx11
But need cmake minimal version 3.12.4. Tried this in CMakeLists.txt but the CI ENV seems unable to satisfy (3.10.2)

Any idea? @er1cthe0ne @cj-chung

@er1cthe0ne
Copy link
Contributor Author

@w2520n2520 - allow me to suggest a few things, let me know if that make sense.

First thing is to setup a local compiling environment:
https://github.com/futurewei-cloud/alcor-control-agent/blob/master/src/README.md
cd ~/dev/alcor-control-agent
./build/build.sh
Once you have the build container setup, you can enter the docker container and rebuild ACA anytime:
docker exec -it a1 /bin/bash
cd /mnt/host/code && cmake . && make
If we don't want to use containers to build, an alternate approach is to setup the physical machine for building and running, please see ./build/aca-machine-init.sh on how to setup the dependencies

Since @chenpiaoping is looking into ACA, maybe he can give a hand on it.

Once you have the local build setup, we can resolve the issues quickly. If there is a need to update the cmake version on our CI to 3.12.4, we can make that modification in our CI environment assuming that's the solution to resolve all the compiling issues.

@w2520n2520
Copy link
Contributor

Tried in local env, same issue.

@er1cthe0ne
Copy link
Contributor Author

Tried in local env, same issue.

Let's update your local environment's cmake version to 3.12.4 or higher, apply the fix you tried previously on CMakeLists.txt and see if that would address the issues. Please show us the error message so that we can take a look.

@w2520n2520
Copy link
Contributor

-- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.2.11")
-- Using protobuf
-- Found OpenSSL: /usr/lib/x86_64-linux-gnu/libcrypto.so (found version "1.1.1")
-- Using gRPC 1.24.3
-- Found Protobuf: /usr/local/lib/libprotobuf.a;-lpthread (found version "3.8.0")
-- Found Threads: TRUE
-- Found Protobuf: /usr/local/bin/protoc-3.8.0.0 (found version "3.8.0.0")
-- Using protobuf
-- Using gRPC 1.24.3
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Using protobuf
-- Using gRPC 1.24.3
-- Found GTest: /usr/local/lib/libgtest.so
-- Configuring done
-- Generating done
-- Build files have been written to: /mnt/host/code
[ 2%] Generating goalstateprovisioner.pb.cc, goalstateprovisioner.pb.h, goalstateprovisioner.grpc.pb.cc, goalstateprovisioner.grpc.pb.h
Scanning dependencies of target grpc
[ 4%] Building CXX object src/grpc/CMakeFiles/grpc.dir/goalstateprovisioner.pb.cc.o
:0:1: error: macro names must be identifiers
src/grpc/CMakeFiles/grpc.dir/build.make:94: recipe for target 'src/grpc/CMakeFiles/grpc.dir/goalstateprovisioner.pb.cc.o' failed
make[2]: *** [src/grpc/CMakeFiles/grpc.dir/goalstateprovisioner.pb.cc.o] Error 1
CMakeFiles/Makefile2:262: recipe for target 'src/grpc/CMakeFiles/grpc.dir/all' failed
make[1]: *** [src/grpc/CMakeFiles/grpc.dir/all] Error 2
Makefile:102: recipe for target 'all' failed
make: *** [all] Error 2

@er1cthe0ne
Copy link
Contributor Author

er1cthe0ne commented Aug 25, 2020

Hi @w2520n2520 and @gure,

I was able to get your branch to compile, please see the below steps.

  1. Revert the change in CMakeList.txt so that it look like this:
    cmake_minimum_required(VERSION 3.10)
    project(AlcorControlAgent)

# Set the version number.
set(CMAKE_BUILD_TYPE Debug)
set(CMAKE_CXX_STANDARD 14)
set(CPPKAFKA_VERSION_MAJOR 0)
set(CPPKAFKA_VERSION_MINOR 3)
set(CPPKAFKA_VERSION_REVISION 1)
set(CPPKAFKA_VERSION "${CPPKAFKA_VERSION_MAJOR}.${CPPKAFKA_VERSION_MINOR}.${CPPKAFKA_VERSION_REVISION}")
set(RDKAFKA_MIN_VERSION 0x00090400)

#add_compile_options(-O0) # enable no optimization during development
add_compile_options(-Wall)
#add_compile_definitions(-D_GLIBCXX_USE_CXX11_ABI=0)

add_subdirectory(src)
add_subdirectory(test)

  1. add the below global variables under test/gtest/aca_tests.cpp and test/func_tests/gs_test.cpp as mentioned previously:
    string g_ofctl_command = EMPTY_STRING;
    string g_ofctl_target = EMPTY_STRING;
    string g_ofctl_options = EMPTY_STRING;

  2. run "cmake ." and then make:
    root@28abfb290c2e:/mnt/host/code/aca-dhcp# make
    [ 8%] Built target grpc
    [ 54%] Built target proto
    [ 86%] Built target AlcorControlAgentLib
    [ 91%] Built target AlcorControlAgent
    [ 95%] Built target aca_tests
    Scanning dependencies of target gs_tests
    [ 97%] Building CXX object test/CMakeFiles/gs_tests.dir/func_tests/gs_tests.cpp.o
    [100%] Linking CXX executable ../build/tests/gs_tests
    [100%] Built target gs_tests

@w2520n2520
Copy link
Contributor

w2520n2520 commented Aug 26, 2020

Hi @er1cthe0ne , @Gzure

Adding g_ofctl_command to both gtest and functest makes the compilation work.
I may figure out the reason of this issue:

  1. Executable aca_test depends on AlcorControlAgentLib which compile source file including aca_ovs_control, which has declaration of g_ofctl_command.
  2. Linking error was delayed until executable aca_test was linked and resolved.
  3. Executable AlcorControlAgent was ok because the it contained the g_ofctl_command definition.

Would it be possible that g_ofctl_command is self-contained inside AlcorControlAgentLib since it is a lib?

@er1cthe0ne
Copy link
Contributor Author

I am thinking about to remove it, on issue #120 number 4 point, I am suggesting to remove g_ofctl_command since we may not need it.

w2520n2520 added a commit to w2520n2520/alcor-control-agent that referenced this issue Aug 26, 2020
@w2520n2520
Copy link
Contributor

All related unit test passed. Request to merge. @er1cthe0ne @cj-chung

@er1cthe0ne
Copy link
Contributor Author

@w2520n2520 @gure, please reference to this script for physical machine setup of ACA:
https://github.com/futurewei-cloud/alcor-control-agent/blob/master/build/aca-machine-init.sh

w2520n2520 added a commit to w2520n2520/alcor-control-agent that referenced this issue Aug 29, 2020
@w2520n2520
Copy link
Contributor

w2520n2520 commented Aug 29, 2020

Hi @er1cthe0ne
ovs_control.packet_in
-->
monitor_vconn
--> monitor
-->control() ----------> has no caller

@Gzure and I do this for testing:
//ACA_OVS_Control::get_instance().monitor("br-tun", "resume");
ACA_OVS_Control::get_instance().monitor("br-int", "resume");
And B.T.W, why only one monitor is allowed?

w2520n2520 added a commit to w2520n2520/alcor-control-agent that referenced this issue Aug 29, 2020
@er1cthe0ne
Copy link
Contributor Author

Hi @er1cthe0ne
ovs_control.packet_in
-->
monitor_vconn
--> monitor
-->control() ----------> has no caller

Hi @w2520n2520, I am not sure I understand the concern. Can you tell me what is your question?

And B.T.W, why only one monitor is allowed?

This could be a limitation based on the OVS code we use, but I don't think it is a blocking issue because we would only monitor br-int for the scenarios we defined. @cj-chung to correct me if I am wrong.

w2520n2520 added a commit to w2520n2520/alcor-control-agent that referenced this issue Aug 29, 2020
@w2520n2520
Copy link
Contributor

w2520n2520 commented Aug 29, 2020

void ACA_OVS_Control::parse_packet(void *packet)
{
  aca_dhcp_server::ACA_Dhcp_Server::get_instance().dhcps_recv()
}
void OVS_Control::monitor_vconn()
{
  ACA_OVS_Control::get_instance().parse_packet(pin.packet)
}
void OVS_Control::monitor(const char *bridge, const char *opt)
{
  monitor_vconn(vconn, true, resume_continuations, bridge)
}

4.1

int ACA_OVS_Control::control()
{
  monitor(target, options);
}

4.2

int main()
{
  ACA_OVS_Control::get_instance().monitor("br-tun", "resume");
}

Since we didn't find the caller of control so we change the entrance in main to br-int to debug packet procedure.

@cj-chung
Copy link
Contributor

cj-chung commented Aug 29, 2020

Yes. that's correct call stack.
The current monitor in ACA_OVS_Control is daemonized but not multiple threads yet. So I think one ACA instance can only has 1 monitor channel.

@cj-chung cj-chung reopened this Aug 29, 2020
w2520n2520 added a commit to w2520n2520/alcor-control-agent that referenced this issue Aug 29, 2020
@w2520n2520
Copy link
Contributor

Hi @cj-chung ,

One question:
For packet-out procedure, we observe br-tun's TX keeps increasing but no packet seen in tcp-dump. So we changed actions to "output:8" but no luck.
The calling of below has no error.

error = parse_ofp_packet_out_str(&po, options,
                                         ports_to_accept(bridge),
                                         tables_to_accept(bridge),
                                         &usable_protocols);

Whether another flow should be installed for packet-replying-from-server-to-client?

//bridge = "br-int" opts = "in_port=controller packet=<hex-string> actions=normal"
aca_ovs_control::ACA_OVS_Control::get_instance().packet_out(bridge.c_str(),
                                                              options.c_str());

In a word, we have no error seen in code flow now but no packet-out observed on network. We may use your help to figure it out. Thanks. @Gzure @er1cthe0ne

@cj-chung
Copy link
Contributor

cj-chung commented Aug 29, 2020

Hi @cj-chung ,

One question:
For packet-out procedure, we observe br-tun's TX keeps increasing but no packet seen in tcp-dump. So we changed actions to "output:8" but no luck.
The calling of below has no error.

error = parse_ofp_packet_out_str(&po, options,
                                         ports_to_accept(bridge),
                                         tables_to_accept(bridge),
                                         &usable_protocols);

Whether another flow should be installed for packet-replying-from-server-to-client?

//bridge = "br-int" opts = "in_port=controller packet=<hex-string> actions=normal"
aca_ovs_control::ACA_OVS_Control::get_instance().packet_out(bridge.c_str(),
                                                              options.c_str());

In a word, we have no error seen in code flow now but no packet-out observed on network. We may use your help to figure it out. Thanks. @Gzure @er1cthe0ne

The "in_port" indicates where the packet sent to, so the packet should be sent to controller. If you use tcpdump to capture packets on br-tun or br-int, you should able to see the packet on these bridges.

You can use the following command to test the packet-out function:
./build/bin/AlcorControlAgent -c packet-out -t br-int -o "in_port=controller packet=02AC10FF002202AC10FF001108004500001C000100000A015A9DAC10FF0BAC10FF160800F7FF00000000 actions=normal"

and use tcpdump -i br-int -v on ovs, you should able to capture the packet.

@w2520n2520
Copy link
Contributor

w2520n2520 commented Aug 31, 2020

Hi @cj-chung @er1cthe0ne ,

Packet-Out Syntax
packet=hex-string
The actual packet to send, expressed as a string of hexadecimal
bytes. This field is required.
http://www.openvswitch.org/support/dist-docs/ovs-ofctl.8.txt

It seems this command only send "actual packet" which means dhcp needs to encap the whole packet from app-to-eth instead of dhcp payload.
Am I right?

@cj-chung
Copy link
Contributor

Hi @cj-chung @er1cthe0ne ,

Packet-Out Syntax
packet=hex-string
The actual packet to send, expressed as a string of hexadecimal
bytes. This field is required.
http://www.openvswitch.org/support/dist-docs/ovs-ofctl.8.txt

It seems this command only send "actual packet" which means dhcp needs to encap the whole packet from app-to-eth instead of dhcp payload.
Am I right?

@w2520n2520 Yes. You need a whole packet for the hex string. Since I just directly send the packet string to OVS.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Feature good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants