Skip to content
This repository has been archived by the owner on Mar 31, 2023. It is now read-only.

add arp responder #167

Merged
merged 19 commits into from
Jan 25, 2021
Merged

add arp responder #167

merged 19 commits into from
Jan 25, 2021

Conversation

luyaoluo
Copy link
Contributor

  • add aca_arp_responder.cpp and aca_arp_responder.h
  • modify entry in aca_ovs_control.cpp

@er1cthe0ne er1cthe0ne self-requested a review November 20, 2020 23:57
@er1cthe0ne er1cthe0ne linked an issue Nov 20, 2020 that may be closed by this pull request
@er1cthe0ne
Copy link
Contributor

@lly00 - do you have any update on this implementation? Thanks.

@luyaoluo
Copy link
Contributor Author

luyaoluo commented Dec 8, 2020

@lly00 - do you have any update on this implementation? Thanks.

Yes, there are some updates. I'm testing l2 and l3 neighbors currently. I'll make a commmit after finishing it.

@luyaoluo
Copy link
Contributor Author

@er1cthe0ne I commit some updates on L2 neighbors. There are still some things I have not done: send packet out if it don't find the corresponding mac, and how to get the vlan information from goal state.
Another question is that I set vlan id of ports but the received packets do not contain vlan header. I don't figure out why.

}

_arp_db_mutex.lock();
_arp_db->insert(make_pair(stData,arp_cfg_in->mac_address));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this unordered_map with mutex lock will work with concurrent threads but will perform poorly, please switch to the new concurrent hashtable and look at the below define in aca_vlan_manager for reference:

// hashtable <key: tunnel ID, value: vpc_table_entry>
CTSL::HashMap<uint, vpc_table_entry *> _vpcs_table;

@er1cthe0ne
Copy link
Contributor

er1cthe0ne commented Dec 15, 2020

There are still some things I have not done: send packet out if it don't find the corresponding mac,

You can skip that for now since we haven't find a need for it.

and how to get the vlan information from goal state.

you can call this to go from tunnel_id to internal vlan_id: ACA_Vlan_Manager::get_instance().get_or_create_vlan_id(tunnel_id);

Another question is that I set vlan id of ports but the received packets do not contain vlan header. I don't figure out why.

I looked through the change and didn't see obvious problem on that. You may need to step through the line of those code when you run it in a debugger or print a lot of logs to see what's wrong.

Please work with @Gzure to see if he can give you some pointer or help.

@er1cthe0ne
Copy link
Contributor

@lly00 my PR #176 has been merged and I resolved the conflicts in your branch. Please go ahead to do git pull to get the latest change. The git automatic merge may not work perfectly, so please review the merge and clean up your code if needed. The goal is to get this change ready to merge this week. So please get yourself familiar with running aca_tests and confirm passing when you are ready: https://github.com/futurewei-cloud/alcor-control-agent/wiki/How-to-run-the-full-suite-of-aca_tests

@er1cthe0ne
Copy link
Contributor

@lly00 - let's update this PR with your latest code so what we can see the status. Following up on the two issues:

  1. alcor submodule - please see this new wikipage on how to handle it: https://github.com/futurewei-cloud/alcor-control-agent/wiki/Dealing-with-Alcor-Submodule
  2. vlan not available in packet_in function - please share more information or debug prints so that we can follow up. Do you think ACA_OVS_Control::parse_packet function not correctly handle vlan?

@luyaoluo
Copy link
Contributor Author

@er1cthe0ne I made some update based on your feedback. I use two dockers to ping each other in my test, and the received packets have vlan tags if I do nothing else. However, since both dockers are connected to br-int, I don't want them to send arp request through the NORMAL flow entry, so I add a flow entry in br-int(arp,arp_op=1,actions=OUTPUT:"patch-tun"). After that, the received packets have on vlan tags anymore.
withoutvlan
withvlan

@er1cthe0ne
Copy link
Contributor

er1cthe0ne commented Jan 6, 2021

Let me layout the remaining items:

  1. address the latest PR feedback
  2. get DISABLED_2_ports_CREATE_test_traffic_PARENT to pass
  3. remove the original arp rules in int ACA_OVS_L2_Programmer::setup_ovs_bridges_if_need()
  4. remove the static arp responder add rule in ACA_Vlan_Manager::create_l2_neighbor and the corresponding delete rule
  5. remove the static arp responder add rule in ACA_OVS_L3_Programmer::create_or_update_router, and replace with your on-demand arp responder, also remove the corresponding delete rule
  6. reconfirm all the existing tests are passing once the change is ready

@luyaoluo
Copy link
Contributor Author

luyaoluo commented Jan 7, 2021

On DISABLED_2_ports_CREATE_test_traffic_PARENT failure, the arp request packet will be sent to ACA with your change, and ACA would craft and send back an ARP response packet. Did you see that happen in your code?

It seems that ACA can not handle the received message without a thread( new thread(bind(&ACA_OVS_Control::monitor, &ACA_OVS_Control::get_instance(), "br-tun", "resume")))

@luyaoluo
Copy link
Contributor Author

@er1cthe0ne I modify some tests in order to pass them(now all tests passed)

@@ -134,9 +134,6 @@ int ACA_OVS_L2_Programmer::setup_ovs_bridges_if_need()
execute_openflow_command("add-flow br-tun \"table=20,priority=1 actions=resubmit(,22)\"",
not_care_culminative_time, overall_rc);

execute_openflow_command("add-flow br-tun \"table=2,priority=25,arp,arp_op=1,in_port=\"patch-int\" actions=resubmit(,51)\"",
not_care_culminative_time, overall_rc);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please go ahead and remove the rule below also, since table=51 is for static ARP responder only and it is no longer needed.
execute_openflow_command("add-flow br-tun "table=51,priority=1 actions=resubmit(,22)"",
not_care_culminative_time, overall_rc);

@@ -736,6 +746,12 @@ TEST(ovs_l2_test_cases, DISABLED_2_ports_CREATE_test_traffic_CHILD)
ASSERT_EQ(overall_rc, EXIT_SUCCESS);
overall_rc = EXIT_SUCCESS;

// monitor br-tun for arp request message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since all the traffic test cases will need to have this monitoring thread running, it is possible to just add it to "int main(int argc, char **argv)" in aca_test_main.cpp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are different places in arp/dhcp tests, one on "br-tun" and one on "br-int"

overall_rc = Aca_Net_Config::get_instance().execute_system_command(
"docker exec con1 ping -c1 " + vip_address_4);
EXPECT_NE(overall_rc, EXIT_SUCCESS);
// overall_rc = Aca_Net_Config::get_instance().execute_system_command(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why comment this three lines out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget to remove it. I commented it speed up the test, otherwise there are too many error messages.

@er1cthe0ne
Copy link
Contributor

@lly00 Design update:

With Zeta gateway coming online, for ARP request of virtual IP which is unknown to ACA, the same packet needs to be sent back to OVS so that it will be forwarded to Zeta. Just send the original packet down to table 22 will do the trick, you don't need to worry about the Zeta openflow rules since it is handled by Zeta code. Let me know if you have any questions.

@luyaoluo
Copy link
Contributor Author

@lly00 Design update:

With Zeta gateway coming online, for ARP request of virtual IP which is unknown to ACA, the same packet needs to be sent back to OVS so that it will be forwarded to Zeta. Just send the original packet down to table 22 will do the trick, you don't need to worry about the Zeta openflow rules since it is handled by Zeta code. Let me know if you have any questions.

I'll try it. By the way, I meet some problems while using two threads for br-int and br-tun. I don't clearly know how ACA_OVS_Control::monitor works, can it handle two bridges at the same time?

The error occurs while add flow like "table=0,...,actions=CONTROLLER"
The program will abort with:(null): lib/poll-loop.c 111 assertion !fd != !wevent failed in poll_create_node()

@er1cthe0ne
Copy link
Contributor

@lly00 Design update:
With Zeta gateway coming online, for ARP request of virtual IP which is unknown to ACA, the same packet needs to be sent back to OVS so that it will be forwarded to Zeta. Just send the original packet down to table 22 will do the trick, you don't need to worry about the Zeta openflow rules since it is handled by Zeta code. Let me know if you have any questions.

I'll try it. By the way, I meet some problems while using two threads for br-int and br-tun. I don't clearly know how ACA_OVS_Control::monitor works, can it handle two bridges at the same time?

The error occurs while add flow like "table=0,...,actions=CONTROLLER"
The program will abort with:(null): lib/poll-loop.c 111 assertion !fd != !wevent failed in poll_create_node()

We hit this assert previously when we are calling ovs library call in different threads concurrently. We were compiling ovs library in debug mode. Later we merged the change to compile ovs library in release mode and don't hit that assert (assert is disabled in release builds) then move forward without issue. Please go ahead to rebase your branch, rerun the part of getting start guide script to rebuild ovs library which will create release binary with the latest code at master.

@luyaoluo
Copy link
Contributor Author

@er1cthe0ne There are still some problems. Some tests try to delete br-tun and br-int to clean up, and we will meet the error(could not initialize control socket).
By the way, can I still use function packet_out function in ACA_OVS_Control with "actions=resubmit(,22)" to resubmit to table 22?

@er1cthe0ne
Copy link
Contributor

@er1cthe0ne There are still some problems. Some tests try to delete br-tun and br-int to clean up, and we will meet the error(could not initialize control socket).

Got it. Thanks for trying that out. Let's don't worry about putting the adding the monitoring thread in aca_test_main.cpp.

By the way, can I still use function packet_out function in ACA_OVS_Control with "actions=resubmit(,22)" to resubmit to table 22?

That's the idea, dhcp code is doing similar things already: ACA_Dhcp_Server::dhcps_xmit

@luyaoluo
Copy link
Contributor Author

  1. There are chances that we can not pass the tests on two machines because there is no arp flow table in br-tun and the child machine can only find the corresponding mac address in arp cache which may become incomplete.
  2. I comment two lines in aca_main.cpp for pulsar consumer. There is a version conflict of protobuf and I'll fix it in next PR later.

@luyaoluo
Copy link
Contributor Author

The program terminated after throwing an instance of 'google::protobuf::FatalException': This program was compiled against version 3.3.0 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.8.0). Contact the program author for an update. If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library. (Version verification failed in "/pulsar/pulsar-client-cpp/pkg/deb/BUILD/apache-pulsar-2.6.1/pulsar-client-cpp/generated/lib/PulsarApi.pb.cc".)

@@ -222,12 +222,13 @@ void ACA_ARP_Responder::arp_recv(uint32_t in_port, void *vlan_hdr, void *message
return;

}
void ACA_ARP_Responder::arp_xmit(uint32_t in_port, void *vlanmsg, void *message){
void ACA_ARP_Responder::arp_xmit(uint32_t in_port, void *vlanmsg, void *message, int is_find){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_found

@er1cthe0ne
Copy link
Contributor

@lly00 - please update this PR soon so that we can merge this in and avoid merge conflict with other changes.

@luyaoluo
Copy link
Contributor Author

@lly00 - please update this PR soon so that we can merge this in and avoid merge conflict with other changes.

The change in my last commit is that I use thread.join() to make the child test cases wait for parent until interrupted.

Copy link
Contributor

@er1cthe0ne er1cthe0ne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for addressing all the comments and get this through.

@er1cthe0ne er1cthe0ne merged commit 14362aa into futurewei-cloud:master Jan 25, 2021
@luyaoluo luyaoluo deleted the dev branch March 10, 2021 06:50
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ARP responder for both L2 and L3 neighbors
2 participants