Skip to content
This repository has been archived by the owner on Mar 31, 2023. It is now read-only.

Zeta integration implementation #158

Merged

Conversation

zhangml
Copy link
Contributor

@zhangml zhangml commented Oct 29, 2020

#149 Zeta integration implementation
This PR is the code implementation of the integration of Zeta and ACA. We have added some files:

  1. Parse oam packet installation or delete direct path。
    aca_oam_server.h
    aca_oam_server.cpp
    aca_oam_port_manager.h
    aca_oam_port_manager.cpp
    aca_test_oam.cpp

  2. Receive alocr goal state message to update the group table.
    aca_zeta_programming.h
    aca_zeta_programming.cpp

@zhangml
Copy link
Contributor Author

zhangml commented Oct 29, 2020

Hello, Eric. Is there a specific implementation of OAM packet in the zeta gateway now? I want to understand its data structure.

Copy link
Contributor

@er1cthe0ne er1cthe0ne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start in the right direction. You have good understanding on the proposed code change. The next step is to get is ready to merge without the full implementation. We just needed to make sure the change doesn't break any existing functionality.

I am working with Alcor team on the needed contract update so that we can send down an updated goal state message with Zeta info.

@er1cthe0ne
Copy link
Contributor

Hello, Eric. Is there a specific implementation of OAM packet in the zeta gateway now? I want to understand its data structure.

please reference this: https://github.com/futurewei-cloud/zeta/blob/main/docs/design/zeta_system_design.md#735-in-band-operation

And look at how DHCP model parse the packet to get some idea.

@er1cthe0ne er1cthe0ne added Feature Zeta Integration with Zeta labels Oct 30, 2020
@er1cthe0ne er1cthe0ne linked an issue Oct 30, 2020 that may be closed by this pull request
@zhangml
Copy link
Contributor Author

zhangml commented Oct 30, 2020

After the ACA receives the OAM package, it will parse the information in it to install and delete the direct path. But installing the direct path needs to specify the VXLAN interface, that is, the mapping relationship between the host and the VXLAN interface is required. Is this mapping relationship stored in ACA? Or does ACA need to dump flow or view OVSDB interface information to get it?

@er1cthe0ne
Copy link
Contributor

After the ACA receives the OAM package, it will parse the information in it to install and delete the direct path. But installing the direct path needs to specify the VXLAN interface, that is, the mapping relationship between the host and the VXLAN interface is required. Is this mapping relationship stored in ACA? Or does ACA need to dump flow or view OVS interface information to get it?

If you are talking about the "remote" host (IP), that information is embedded inside the OAM packet:

Field Offset (Byte) Value
Matcher_SIP 0 Inner Packet SIP
Matcher_DIP 4 Inner Packet DIP
Matcher_SPORT 8 Inner Packet SPort
Matcher_DPORT 10 Inner Packet DPort
Matcher_Protocol 12 Inner Packet Protocol
Matcher_VNI 13 VxLAN/Geneve vni
DestInst_IP 16 Destination Inst IP
DestNode_IP 20 Destination Node IP
DestInst_MAC 24 Destination Inst MAC
DestNode_MAC 30 Destination Node MAC
Idle_Timeout 36 0 - 65536s

@zhangml
Copy link
Contributor Author

zhangml commented Oct 30, 2020

After the ACA receives the OAM package, it will parse the information in it to install and delete the direct path. But installing the direct path needs to specify the VXLAN interface, that is, the mapping relationship between the host and the VXLAN interface is required. Is this mapping relationship stored in ACA? Or does ACA need to dump flow or view OVS interface information to get it?

If you are talking about the "remote" host (IP), that information is embedded inside the OAM packet:

Field Offset (Byte) Value
Matcher_SIP 0 Inner Packet SIP
Matcher_DIP 4 Inner Packet DIP
Matcher_SPORT 8 Inner Packet SPort
Matcher_DPORT 10 Inner Packet DPort
Matcher_Protocol 12 Inner Packet Protocol
Matcher_VNI 13 VxLAN/Geneve vni
DestInst_IP 16 Destination Inst IP
DestNode_IP 20 Destination Node IP
DestInst_MAC 24 Destination Inst MAC
DestNode_MAC 30 Destination Node MAC
Idle_Timeout 36 0 - 65536s

@zhangml zhangml closed this Oct 30, 2020
@zhangml zhangml reopened this Oct 30, 2020
@zhangml
Copy link
Contributor Author

zhangml commented Oct 30, 2020

I mean how to query the corresponding VXLAN interface after knowing the remote ip? For example:
Port "vxlan233"
Interface "vxlan233"
type: vxlan
options: {key=flow, remote_ip="172.16.62.233"}
The VXLAN interface corresponding to remote ip 172.16.62.233 is vxlan233. In the direct path installation process, do we obtain this mapping relationship by directly querying ovsdb?

@er1cthe0ne
Copy link
Contributor

I mean how to query the corresponding VXLAN interface after knowing the remote ip? For example:
Port "vxlan233"
Interface "vxlan233"
type: vxlan
options: {key=flow, remote_ip="172.16.62.233"}
The VXLAN interface corresponding to remote ip 172.16.62.233 is vxlan233. In the direct path installation process, do we obtain this mapping relationship by directly querying ovsdb?

okay, I understand now.

The remote IP was not known before ACA receives the OAM packet. Therefore, after ACA receives the OAM packet, it will need to create this remote port "vxlan233" corresponding to 172.16.62.233 if it is not yet there. There are existing code/logic to create this port, and also internal data structure to keep track of these "outports". Let me know if you have trouble finding it.

@zhangml
Copy link
Contributor Author

zhangml commented Oct 30, 2020

I mean how to query the corresponding VXLAN interface after knowing the remote ip? For example:
Port "vxlan233"
Interface "vxlan233"
type: vxlan
options: {key=flow, remote_ip="172.16.62.233"}
The VXLAN interface corresponding to remote ip 172.16.62.233 is vxlan233. In the direct path installation process, do we obtain this mapping relationship by directly querying ovsdb?

okay, I understand now.

The remote IP was not known before ACA receives the OAM packet. Therefore, after ACA receives the OAM packet, it will need to create this remote port "vxlan233" corresponding to 172.16.62.233 if it is not yet there. There are existing code/logic to create this port, and also internal data structure to keep track of these "outports". Let me know if you have trouble finding it.

okay, thanks, I have found the corresponding code.

aca_get_outport_name(alcor::schema::NetworkType network_type, std::string remote_ip)
{
  std::hash<std::string> str_hash;

  // TODO: change to hex encoding to get more possible combinations
  auto hash_value = str_hash(remote_ip) % MAX_OUTPORT_NAME_POSTFIX;

  return aca_get_network_type_string(network_type) + "-" + std::to_string(hash_value);
}


if (!aca_is_port_on_same_host(remote_host_ip)) {
ACA_LOG_INFO("%s", "port_neighbor not exist!\n");
string neighbor_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you will get the neighbor ID from the oam packet.

// get netigbor_id

//crate neighbor_port
aca_ovs_l2_programmer::ACA_OVS_L2_Programmer::get_instance().create_or_update_l2_neighbor(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can reuse the same fuction create_or_update_l2_neighbor.

For Zeta enabled port, looks like we can still do the outport for both ZGC gateway path and direct path. Let me know if that's not doable.

That's the source side, for the destination side, we don't expect an OAM packet to go there to setup direct path (until the destination side send packet to the source side). In order for the destination side to receive the packet, we need to create the "outport". The idea is during the creation Zeta supported ports, it should create a generic "outport" to accept all vxlan packets.

If this generic "outport" for all vxlan packets would have problem working with the current aca remote IP specific "outport", we will then switch all the aca implementation for both Zeta supported and non-zeta supported to use this generic "outport".

",idle_timeout=" + action.idle_timeout + ",output:" + outport_name;

// Adding unicast rules in table20
string opt = "table=20,priority=50," + cmd_match + "," + cmd_action;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great aproach, keep in mind table 20 (unicast) is not hooked up in current ACA yet, you will needed to hook it up :) using openflow rules.

ACA_Vlan_Manager::get_instance().set_oam_server_port(vpc_id, oam_server_port);

//update oam_ports_table and add the OAM punt rule also if this is the first port in the VPC
Aca_Oam_Port_Manager::get_instance().add_vpc(oam_server_port, vpc_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this implementation has two place to track similar things, please consolidate to avoid confusions:

set_oam_server_port - put the oam_port info into the vpc table
Aca_Oam_Port_Manager::get_instance().add_vpc - put vpc ids into the unordered_map using oam_port as the key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we really need to address this as we are using two places to store same information.

string cmd = "-O OpenFlow13 add-group br-tun group_id=" + zeta_cfg->group_id + ",type=select";
list<string>::iterator it;
for (it = zeta_cfg->zeta_buckets.begin(); it != zeta_cfg->zeta_buckets.end(); it++) {
string outport_name = aca_get_outport_name(alcor::schema::NetworkType::VXLAN, *it);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will need to create an outport to send traffic to ZGC gateway.

@zhangml zhangml changed the title Interface -Zeta integration implementation Zeta integration implementation Nov 24, 2020
@er1cthe0ne
Copy link
Contributor

er1cthe0ne commented Nov 25, 2020

@zhangml I have merged my aca change to take in the latest contract, you should take the change into your branch to continuing the development.

Next steps:

  1. Resolve the "alcor" submodule conflict and get your branch to compile in the CI environment
  2. Address the feedbacks I left in this PR
  3. Can we still do remote IP specific outports (current implementation) at the source node and destination node or we have to switch to generic vxlan outport for current non zeta port plus zeta support port? Need to try it with manual rules to confirm.
  4. Think about the operation needed for add/delete group rule, direct path rule, outports. Think about how to manage the lifetime of them.
  5. Come up with the efficient data structure to store the needed info, don't want to use two data structure to store similar info. Except for the cache of valid oam udp port for quick check under ACA_OVS_Control::parse_packet
  6. Review the new testing instruction I put in: https://github.com/futurewei-cloud/alcor-control-agent/wiki/How-to-run-the-full-suite-of-aca_tests
  7. Confirm all the existing test are still passing with your change, share the test result
  8. The goal is get this change to merge in as long as it doesn't break existing functionality


static Aca_Oam_Port_Manager &get_instance();

void create_entry_unsafe(uint32_t port_number);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think we need to use uint32_t, we can use uint directly (it will likely be 32bit anyway. This will give compile a chance to optimize it.

_oam_ports_table_mutex.lock();
if (_oam_ports_table.find(port_number) == _oam_ports_table.end()) {
create_entry_unsafe(port_number);
_create_oam_ofp(port_number);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_create_oam_ofp can return failure now, let's propagate the failure to this add_vpc function so that it would return an overall_rc.

if (found_auxgateway.aux_gateway_type() == ZETA) {
ACA_LOG_INFO("%s", "AuxGateway_type is zeta!\n");
// Delete the zeta settings of vpc
overall_rc = ACA_Zeta_Programming::get_instance().delete_zeta_config(
Copy link
Contributor

@er1cthe0ne er1cthe0ne Nov 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happen when there is multiple ports (from the same VPC) sharing this zeta configuration? when one of the port is deleted, we cannot simply delete the configuration because it will affect the other port which is not deleted.

Also multiple VPC can share a ZGC configuration.

@er1cthe0ne
Copy link
Contributor

@zhangml - as a reminder, please do give me permission to make code change in your branch: https://github.com/zhangml/alcor-control-agent/

@er1cthe0ne
Copy link
Contributor

er1cthe0ne commented Nov 26, 2020

@zhangml - as a reminder, please do give me permission to make code change in your branch: https://github.com/zhangml/alcor-control-agent/

@zhangml - CI has passed now, and I accepted the invite for your branch. Let's focus on getting the test cases passing as the next step: https://github.com/futurewei-cloud/alcor-control-agent/wiki/How-to-run-the-full-suite-of-aca_tests

@er1cthe0ne
Copy link
Contributor

er1cthe0ne commented Nov 26, 2020

My test result: current test results on your branch after my change :) . All tests are passing now:

./build/tests/aca_tests:
[----------] Global test environment tear-down
[==========] 33 tests from 6 test suites ran. (5869 ms total)
[ PASSED ] 33 tests.

[ OK ] ovs_l2_test_cases.DISABLED_2_ports_CREATE_test_traffic_CHILD (484 ms)
[----------] 1 test from ovs_l2_test_cases (484 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (484 ms total)
[ PASSED ] 1 test.

[ OK ] ovs_l2_test_cases.DISABLED_2_ports_CREATE_test_traffic_PARENT (30555 ms)
[----------] 1 test from ovs_l2_test_cases (30556 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (30556 ms total)
[ PASSED ] 1 test.

[ OK ] dhcp_request_test_case.DISABLED_l2_dhcp_test (3510 ms)
[----------] 1 test from dhcp_request_test_case (3510 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (3510 ms total)
[ PASSED ] 1 test.

[ OK ] dhcp_request_test_case.DISABLED_l3_dhcp_test (4506 ms)
[----------] 1 test from dhcp_request_test_case (4506 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (4506 ms total)
[ PASSED ] 1 test.

[ OK ] ovs_l3_test_cases.DISABLED_2_ports_ROUTING_test_traffic_one_machine (25777 ms)
[----------] 1 test from ovs_l3_test_cases (25777 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (25777 ms total)
[ PASSED ] 1 test.

[ OK ] ovs_l3_test_cases.DISABLED_2_ports_ROUTING_test_traffic_PARENT (56857 ms)
[----------] 1 test from ovs_l3_test_cases (56857 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (56857 ms total)
[ PASSED ] 1 test.

Next step

I have merged my change into your branch. Please review.

Copy link
Contributor

@er1cthe0ne er1cthe0ne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some change and confirm all the existing test cases are passing with this change.

While the full Zeta support is not done yet. This initialize change laid out the framework and we can continue to iterate with new PRs and tracking individual issues to complete the implementation.

Since this change does not impact existing functionality and all the existing tests are passing. I am going to merge it now.

@er1cthe0ne er1cthe0ne merged commit e093921 into futurewei-cloud:master Nov 29, 2020
@zhangml zhangml deleted the Zeta-integration-implementation branch December 3, 2020 07:17
zhangml pushed a commit to zhangml/alcor-control-agent that referenced this pull request Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Feature Zeta Integration with Zeta
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants