Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BPFire LoxiLB endpoint/backend server egress connection #718

Closed
vincentmli opened this issue Jul 2, 2024 · 11 comments
Closed

BPFire LoxiLB endpoint/backend server egress connection #718

vincentmli opened this issue Jul 2, 2024 · 11 comments
Labels
enhancement New feature or request status:accepted

Comments

@vincentmli
Copy link

vincentmli commented Jul 2, 2024

Is your feature request related to a problem? Please describe.
Here is the BPFireOS deployment network diagram with LoxiLB enabled

       *********                                                                 
    ***         ***                                                              
  **               **                                                            
 *     Internet      *                                                           
 *                   *                                                           
 *                   *                                                           
  **              **                                                            
    ***         ***                                                              
       *********                                                                 
          |                                                                      
          |                                                                      
          |                                                                      
     +----+---------+                                                            
     | ISP comcast  |                                                            
     | router       |                                                            
     | 10.0.0.1     |                                                            
     +----+---------+                                                            
          |                                                                      
     +----+-------------------------------+                                      
     |                                    |                                      
     |                                    |                                      
+----+-red0----------+            +-------+--------+                             
|     10.0.0.232      |            |  10.0.0.171     |                             
|vip: 10.0.0.68      |            |                |                             
|                    |            |  workstation   |                             
|  BPFire            |            |                |                             
|                    |            |                |                             
|                    |            +----------------+                             
|   172.16.1.2       |                                                           
+-----green0---------+                                                           
          |                                                                      
          |                                                                      
          |                                                                      
          |                                                                      
+------eth0------+                                                               
| 172.16.1.9    |                                                               
|                |                                                               
| backend        |                                                               
|                |                                                               
+----------------+                                             

Ingress traffic from workstation 10.0.0.171 to LoxiLB lb 10.0.0.68 which load balance to backend 172.16.1.9 works fine. but egress traffic initiated from backend 172.16.1.10 to workstation 10.0.0.171 or to Internet fail

for example ping workstation 10.0.0.171 from backend 172.16.1.9

tcpdump on BPFire red0 interface

16:31:41.464882 IP 172.16.1.9 > 10.0.0.171: ICMP echo request, id 1220, seq 1, length 64

tcpdump on workstation 10.0.0.171

23:31:41.468461 IP 172.16.1.9 > 10.0.0.171: ICMP echo request, id 1220, seq 1, length 64
23:31:41.468516 IP 10.0.0.171 > 172.16.1.9: ICMP echo reply, id 1220, seq 1, length 64

so the workstation responded with ICMP echo reply, but red0 interface tcpdump does not show the echo reply from workstation, the echo reply appears to be dropped either by BPFire red0 interface with loxilb TC ebpf program attached or before hitting BPFire red0 interface?

note if I disable loxilb on BPFire so TC eBPF program got removed from both red0 and green0 interface, the egress ping works because BPFire will do SNAT from netfilter from 172.16.1.9 -> 10.0.0.171 to 10.0.0.232 -> 10.0.0.171

Describe the solution you'd like

This is egress traffic scenario initiated from endpoint/backend server, which I am not sure if LoxiLB supports or not, I don't know if LoxiLB doing SNAT similar to netfilter would work out the problem or not

Describe alternatives you've considered

In traditional load balancer, there is so called "wildcard listener", meaning a LB or VIP listening on wildcard ip port combination 0.0.0.0:0, when backend 172.16.1.9 initiate traffic to Internet, wildcard listener 0.0.0.0:0 will pickup the traffic and do the connection tracking or address translation (SNAT/DNAT) to allow the egress traffic works.

Additional context
Add any other context or screenshots about the feature request here.

@UltraInstinct14
Copy link
Contributor

UltraInstinct14 commented Jul 5, 2024

The masquerade functionality is supported now. How to use:

loxicmd create firewall --firewallRule="portName:eth1" --snat=10.10.10.254

It will be supported over current loxicmd's firewalling capabilities.

@vincentmli
Copy link
Author

@UltraInstinct14 @TrekkieCoder just for clarification, in your example --firewallRule="portName:eth1" --snat=10.10.10.254, the ip 10.10.10.254 is the IP configured on interface eth1, right? I will upgrade loxilb, loxilb-ebpf, loxicmd to test the feature on BPFire

@TrekkieCoder
Copy link
Collaborator

TrekkieCoder commented Jul 6, 2024

@vincentmli The portname is i think represents "green0" in BPfire terminology i.e. from where packets will come and "10.10.10.254" would be the red0's publicIP. Basically it means do masquerading/snat for packets arriving from eth1.

@vincentmli
Copy link
Author

@UltraInstinct14 @TrekkieCoder I tested BPFire with the new loxilb/loxicmd, it works great, thanks a lot for this work :)

[root@bpfire-2 ~]# loxicmd create firewall --firewallRule="portName:green0" --snat=10.0.0.232
Debug: response.StatusCode: 200
Success
[root@bpfire-2 ~]# loxicmd get firewall -o wide
| SOURCE IP | DESTINATION IP | MIN SPORT | MAX SPORT | MIN DPORT | MAX DPORT | PROTOCOL | PORT NAME | PREFERENCE |       OPTION       | COUNTERS |
|-----------|----------------|-----------|-----------|-----------|-----------|----------|-----------|------------|--------------------|----------|
| 0.0.0.0/0 | 0.0.0.0/0      |         0 |         0 |         0 |         0 |        0 | green0    |          0 | Snat(10.0.0.232:0) | 2:170    |
[root@bpfire-2 ~]# ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: red0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc cake state UP group default qlen 1000
    link/ether 00:a8:2a:e8:34:ec brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.232/24 brd 10.0.0.255 scope global dynamic noprefixroute red0
       valid_lft 172545sec preferred_lft 150945sec
3: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc cake state UP group default qlen 1000
    link/ether 00:a8:2a:e8:34:ed brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.2/24 scope global green0
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc cake state DOWN group default qlen 1000
    link/ether 00:a8:2a:e8:34:ee brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc cake state DOWN group default qlen 1000
    link/ether 00:a8:2a:e8:34:ef brd ff:ff:ff:ff:ff:ff
6: llb0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric/id:3 qdisc cake state UNKNOWN group default qlen 1000
    link/ether 6a:41:b0:69:b2:fc brd ff:ff:ff:ff:ff:ff

[root@bpfire-2 ~]# tcpdump -nn -i red0 icmp -c 10 &
[1] 3740
[root@bpfire-2 ~]# tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on red0, link-type EN10MB (Ethernet), snapshot length 262144 bytes

[root@bpfire-2 ~]# 
[root@bpfire-2 ~]# tcpdump -nn -i green0 icmp -c 10 &
[2] 3741

ping from client 172.16.1.9 to google.com

[root@bpfire-2 ~]# 
21:22:50.075283 IP 172.16.1.9 > 142.251.46.228: ICMP echo request, id 1157, seq 1, length 64
21:22:50.075369 IP 10.0.0.232 > 142.251.46.228: ICMP echo request, id 1157, seq 1, length 64

21:22:50.271598 IP 142.251.46.228 > 10.0.0.232: ICMP echo reply, id 1157, seq 1, length 64
21:22:50.271716 IP 142.251.46.228 > 172.16.1.9: ICMP echo reply, id 1157, seq 1, length 64

vincentmli added a commit to vincentmli/BPFire that referenced this issue Jul 6, 2024
LoxiLB 0.9.4 lack of SNAT feature for egress traffic
initiated from BPFire green network, when loxilb is
enabled, it breaks BPFire green network client Internet
access, this issue is fixed in the loxilb development
branch, temporarily I make loxilb development branch
as 0.9.5 in BPFire so I could test the SNAT feature and
it works.

see detail in loxilb-io/loxilb#718

Signed-off-by: Vincent Li <[email protected]>
@vincentmli
Copy link
Author

vincentmli commented Jul 6, 2024

Ok, I noticed an odd behavior after applying the new firewall SNAT feature. use above BPFire network diagram as reference. green network client 172.16.1.9 by default uses 172.16.1.2 on green0 interface as DNS server because BPFire by default has dhcpd services running on green0 interface to assign dynamic IP to green network client, and the default DNS server configured in dhcpd is the 172.16.1.2. after enable loxilb and creates the firewall snat rules, the DNS resolution from client 172.16.1.9 stops working with DNS server 172.16.1.2, but the DNS resolution works if I change the DNS server to public DNS server 8.8.8.8.

for example when I run dig @172.16.1.2 www.google.com on client 172.16.1.9, BPFire logs kernel message below:

Jul  6 01:52:06 bpfire-2 kernel: IPv4: martian source 172.16.1.2 from 10.0.0.232, on dev green0
Jul  6 01:52:06 bpfire-2 kernel: ll header: 00000000: 00 a8 2a e8 34 ed 00 0e c4 cf 4c 8b 08 00

I also noticed while running tcpdump, no DNS query packet on red0 interface, only DNS query packet on green0 interface.

if I delete the loxilb firewall snat rule, DNS resolution with 172.16.1.2 works, and above martian source log message does not occur. weird behavior. for now I can workaround this weird issue by configuring dhcpd with public DNS server.

UltraInstinct14 added a commit that referenced this issue Jul 7, 2024
gh-718 Fix masquerade for local destination
@TrekkieCoder
Copy link
Collaborator

Thanks @vincentmli for bringing this up. I have merged a potential fix for this. Please double check once when you have time.

@vincentmli
Copy link
Author

@TrekkieCoder @UltraInstinct14 I tested the fix on BPFire, the DNS issue is resolved, thanks a lot for the fix, now I can surf Internet through BPFire with LoxiLB enabled 👍

vincentmli added a commit to vincentmli/BPFire that referenced this issue Jul 7, 2024
LoxiLB 0.9.4 lack of SNAT feature for egress traffic
initiated from BPFire green network, when loxilb is
enabled, it breaks BPFire green network client Internet
access, this issue is fixed in the loxilb development
branch, temporarily I make loxilb development branch
as 0.9.5 in BPFire so I could test the SNAT feature and
it works.

see detail in loxilb-io/loxilb#718

Signed-off-by: Vincent Li <[email protected]>
@vincentmli
Copy link
Author

@TrekkieCoder @UltraInstinct14 I seem to find another bug that may be related to this, disable loxilb does not have the issue.

with same network diagram, when I ssh from green network client 172.16.1.9 to 10.0.0.171, the ssh connection established, then I let the ssh session idle for about 3 minutes, the ssh session is "frozen" and no response to enter key. I run tcpdump capture on both the green0 and red0 interface the same time, attached the screen shot. the last SSHv2 with tcp length 36 frame from client (when user type enter) is lost/dropped, not seen on red0 interface capture.

green0

green0-loxilb

red0

red0-loxilb

I don't know if this is related to loxilb connection tracking (idle connection timeout?). but sometime I noticed even when I keep typing enter key on the ssh session in some random interval, the ssh session becomes "frozen" after some time.

@TrekkieCoder
Copy link
Collaborator

@vincentmli It is due to a ridiculously low connection timeout for snat ( As this is a new feature, cicd and other testing are still in dev). In the meantime, I will release a quick patch with increased connection time and incorporate more unit-tests.

UltraInstinct14 added a commit that referenced this issue Jul 8, 2024
gh-718 Fix masquerade session inactivity timeout
@TrekkieCoder
Copy link
Collaborator

TrekkieCoder commented Jul 8, 2024

The timeout issue has been fixed. There is still a known issue where loxilb is not sending tcp reset after session inactivity for masqueraded sessions. Will update after fix.

@vincentmli
Copy link
Author

Thanks a lot for the quick fix!

vincentmli added a commit to vincentmli/BPFire that referenced this issue Jul 8, 2024
LoxiLB 0.9.4 lack of SNAT feature for egress traffic
initiated from BPFire green network, when loxilb is
enabled, it breaks BPFire green network client Internet
access, this issue is fixed in the loxilb development
branch, temporarily I make loxilb development branch
as 0.9.5 in BPFire so I could test the SNAT feature and
it works.

see detail in loxilb-io/loxilb#718

Signed-off-by: Vincent Li <[email protected]>
UltraInstinct14 added a commit that referenced this issue Jul 8, 2024
gh-718 Fixed tcp reset for session inactivity
vincentmli added a commit to vincentmli/BPFire that referenced this issue Jul 9, 2024
LoxiLB 0.9.4 lack of SNAT feature for egress traffic
initiated from BPFire green network, when loxilb is
enabled, it breaks BPFire green network client Internet
access, this issue is fixed in the loxilb development
branch, temporarily I make loxilb development branch
as 0.9.5 in BPFire so I could test the SNAT feature and
it works.

see detail in loxilb-io/loxilb#718

Signed-off-by: Vincent Li <[email protected]>
UltraInstinct14 added a commit that referenced this issue Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request status:accepted
Projects
None yet
Development

No branches or pull requests

3 participants