-
Notifications
You must be signed in to change notification settings - Fork 34
[Documentation] Security Group Host Design #390
Conversation
Codecov Report
@@ Coverage Diff @@
## master #390 +/- ##
============================================
+ Coverage 34.24% 36.82% +2.57%
- Complexity 959 1151 +192
============================================
Files 413 447 +34
Lines 10086 10703 +617
Branches 1228 1365 +137
============================================
+ Hits 3454 3941 +487
- Misses 6189 6230 +41
- Partials 443 532 +89 Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #390 +/- ##
=========================================
Coverage 36.79% 36.80%
- Complexity 1149 1150 +1
=========================================
Files 447 447
Lines 10703 10703
Branches 1365 1366 +1
=========================================
+ Hits 3938 3939 +1
+ Misses 6231 6230 -1
Partials 534 534
Continue to review full report at Codecov.
|
|
||
. On-Host direct connection from VM port to OVS: it doesn't need extra Linux bridge, Linux devices, Linux kernel or IP Tables; | ||
. Direct and simplified communication between Alcor controller to Alcor control agent; | ||
. Delta update: Update of security group rule only require the changed rule to be push down to compute hosts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we only consider delta update between DPM and ACA? Dose delta update for ovs flows are considered here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we will need delta update for ovs flows also. In order to support that, security group implementation in ACA will need to keep an internal map of port to security group (and its rules) mapping. Keep it in runtime memory would work, but it will not survive reboot/crash and hard to support older ACA to new ACA update (note that this problem is also applicable to ther ACA modules which is runtime memory as storage).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the situation where the security rules of two security groups of a port overlap. If we delete the security group rules of the overlapping part of one of the security groups, it will be troublesome to deal with them by delta update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's step back to the advantage of ACA vs neutron OVS agent:
- plan to survive ACA crash or migration from older ACA to new ACA
- delta update when only one security group rule is added/deleted/modified
Both of them requires ACA to keep an internal map of port to security group (and its rules) mapping. With this internal map, we can accept delta update and figure out the minimal set of openflow rule to modify.
. On-Host direct connection from VM port to OVS: it doesn't need extra Linux bridge, Linux devices, Linux kernel or IP Tables; | ||
. Direct and simplified communication between Alcor controller to Alcor control agent; | ||
. Delta update: Update of security group rule only require the changed rule to be push down to compute hosts; | ||
. Reduced OVS SG rules: Only one set of security group rules is configured in OVS to be shared with all ports using it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the wordings to clarify. Please use the below as a reference: Uses of conjunctive flows
https://docs.openstack.org/neutron/train/contributor/internals/openvswitch_firewall.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've read this link. I mean, how do multiple port share a flow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's take the egress flows as an example:
1.table=TRANSIENT_TABLE,priority=100,in_port=ofport,actions=set_field:{ofport}->reg{REG_PORT},set_field:{local_vlan}->reg{REG_NET},resubmit(,BASE_EGRESS_TABLE)
2.table=BASE_EGRESS_TABLE,priority=65,reg_port=ofport,dl_type=ETHERTYPE_IP,in_port=ofport,dl_src=mac_addr,nw_src=ip_addr,actions=ct(table={RULES_EGRESS_TABLE},zone=NXM_NX_REG{REG_NET}[0..15])
3.table=RULES_EGRESS_TABLE,priority=70+,dl_type=ethertype,reg_port=ofport,nw_dst=dst_ip_prefix,nw_proto=protocol,tcp_src=sport,tcp_dst=dport,ct_state=+new-est,actions=resubmit(,ACCEPT_OR_INGRESS_TABLE)
4.table=ACCEPT_OR_INGRESS_TABLE,priority=90,dl_type=ETHERTYPE_IP,reg_port=ofport,ct_state=+new-est,actions=ct(commit,zone=NXM_NX_REG{REG_NET}[0..15]),resubmit(,ACCEPTED_EGRESS_TRAFFIC_TABLE)
If multiple ports share flow#3, what should flows look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current design can reuse rule for conjunctive flows only, that's only used for security group rule with remote_group.
. Direct and simplified communication between Alcor controller to Alcor control agent; | ||
. Delta update: Update of security group rule only require the changed rule to be push down to compute hosts; | ||
. Reduced OVS SG rules: Only one set of security group rules is configured in OVS to be shared with all ports using it | ||
. Enable scale and performance from the ground up, addressing the biggest pain point on Neutron |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the pain point of neutron remote_group or is it that multiple vm on the same host share a security group rule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe one of the pain point is having too many openflow rules installed on a compute node. Both neutron remote_group and mutiple VMs (on same host) assoicated to a lot of security group rules would contribute to the pain point.
What is the difference between the implementation of the agent side and the neutron agent? |
|
@er1cthe0ne Whether PortState is carried in the Networkconfiguration when updating the security group, if so, will go into Aca_Goal_State_Handler::update_port_states () according to the current code logic for processing, which is not expected to happen. In addition, what fields should be included in the PortState when updating the security group? |
If there is a new port created with security group enable: PortState::operation_type = CREATE If there is no new port created when security group is updated: PortState::operation_type = INFO We will add the ignore logic when PortState::operation_type = INFO inside ACA_Dataplane_OVS::update_port_state_workitem similar to update_subnet_state_workitem
I think we will need request_id, id, fixed_ips, allow_address_pairs (enforced when security group is enabled) and security_group_ids at the minimal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us try to close this document soon. Left a few more comments.
docs/modules/ROOT/pages/high_level/security_group_host_design.adoc
Outdated
Show resolved
Hide resolved
The default security group has rules that allow associated ports of the default rule to talk to each other. | ||
|
||
To support this using the minimal set of OpenFlow rules, we will mark remote ports with its associated security group using conjunctive flows discussed in the next session. | ||
We will update our neighbor configuration schema to include associated security group IDs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@er1cthe0ne, is this supported in our schema? I think it would be good to include it in this design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the design but not in our schema yet. @chenpiaoping we are going to need this neighbor SG ID in our neighbor configuration for remote SG rules right?
docs/modules/ROOT/pages/high_level/security_group_host_design.adoc
Outdated
Show resolved
Hide resolved
docs/modules/ROOT/pages/high_level/security_group_host_design.adoc
Outdated
Show resolved
Hide resolved
docs/modules/ROOT/pages/high_level/security_group_host_design.adoc
Outdated
Show resolved
Hide resolved
docs/modules/ROOT/pages/high_level/security_group_host_design.adoc
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments.
docs/modules/ROOT/pages/high_level/security_group_host_design.adoc
Outdated
Show resolved
Hide resolved
|
||
OVS (controlled by ACA) will need to stamp all packets coming out for a port with the assoicated SG IDs (zero to five). | ||
|
||
Each SG ID is 36 characters string (36 bytes) or we can represent it using 16 bytes encoding. 5 of them will be 80 bytes plus adding 8 bytes NSH header will total 88 bytes increase of packet header. While VxLAN-GPE + NSH can embed all those information, but 88 bytes is too much data to put into header of every single packet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that most majority of customer scenarios only use one SG, can we cut the header overhead to 16+8 = 24 bytes or 88 is not negotiable :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would cut it down to 16+8 bytes NSH header = 24 bytes. That's more negotiable :) @chenpiaoping @Gzure - can we assume most customer scenarios only use one SG?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us bring this question up in the weekly open-source meeting. If the assumption is confirmed, we should take the proposal more serious :-)
3. security - put control packet outside tunnel has lower risk, but inside tenent networks tunnels has high risk because tenent can mess with it | ||
4. acknowledgement - what happen if control packet are missed? There is not reply mechanism | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember we had discussion about storing SG information using local registry alike staff. Didn't find it in the documents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's the discussion mentioned on line 300 where we use control packet to learn about the remote port IP to its assoicated SG, and the store that information into ACA memory (or "registry" to survive reboot?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We will merge the design and keep iterating if we find interesting ideas along the way.
This is the first draft of Security Group Host Design. It contains the high-level approach and the advantage compared to OpenStack Neutron. Diagrams and workflow pictures will be added next.