-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exported service results in timeout from consuming onPremise cluster #2934
Comments
@IceManGreen we would need some additional info to narrow down the problem. Can you clarify the following.
|
@sridhargaddam hello ! Thanks for your answer.
I use K3S with Flannel : $ /var/lib/rancher/k3s/data/current/bin/flannel
CNI Plugin flannel version v0.22.2 (linux/amd64) commit HEAD built on 2024-02-06T01:58:54Z
$ subctl show versions
Cluster "domain-2"
✓ Showing versions
COMPONENT REPOSITORY CONFIGURED RUNNING ARCH
submariner-gateway quay.io/submariner 0.17.0 release-0.17-72c0e6dd56c8 amd64
submariner-routeagent quay.io/submariner 0.17.0 release-0.17-72c0e6dd56c8 amd64
submariner-globalnet quay.io/submariner 0.17.0 release-0.17-72c0e6dd56c8 amd64
submariner-metrics-proxy quay.io/submariner 0.17.0 release-0.17-81b7e55f5306 amd64
submariner-operator quay.io/submariner 0.17.0 release-0.17-d750fbdcb610 amd64
submariner-lighthouse-agent quay.io/submariner 0.17.0 release-0.17-7ad4dd387b0b amd64
submariner-lighthouse-coredns quay.io/submariner 0.17.0 release-0.17-7ad4dd387b0b amd64
Cluster "domain-3"
✓ Showing versions
COMPONENT REPOSITORY CONFIGURED RUNNING ARCH
submariner-gateway quay.io/submariner 0.17.0 release-0.17-72c0e6dd56c8 amd64
submariner-routeagent quay.io/submariner 0.17.0 release-0.17-72c0e6dd56c8 amd64
submariner-globalnet quay.io/submariner 0.17.0 release-0.17-72c0e6dd56c8 amd64
submariner-metrics-proxy quay.io/submariner 0.17.0 release-0.17-81b7e55f5306 amd64
submariner-operator quay.io/submariner 0.17.0 release-0.17-d750fbdcb610 amd64
submariner-lighthouse-agent quay.io/submariner 0.17.0 release-0.17-7ad4dd387b0b amd64
submariner-lighthouse-coredns quay.io/submariner 0.17.0 release-0.17-7ad4dd387b0b amd64
Cluster "e2e-mgmt"
✓ Showing versions
COMPONENT REPOSITORY CONFIGURED RUNNING ARCH
submariner-operator quay.io/submariner 0.17.0 release-0.17-d750fbdcb610 amd64
It seems that I have failing tests on domain-2 and domain-3 because the pods that must run the tests cannot be deployed.
What affinity/selector should have the nodes to be able to deploy the pods for tests ? See attachment for the entire file.
See attachment (sorry github only supports zip files, not tar).
To verify this, I installed HTTP servers on every domain listening on :
Test for control-plane (enp1s0) : # from domain-3 to domain-2
# control-plane
curl 172.16.100.81:8080 -sSLI
HTTP/1.0 200 OK
# data plane
curl 172.16.110.81:8081 -sSLI
HTTP/1.0 200 OK
# from domain-2 to domain-3
# control plane
curl 172.16.100.84:8080 -sSLI
HTTP/1.0 200 OK
# data plane
curl 172.16.110.84:8081 -sSLI
HTTP/1.0 200 OK Everything works fine for simple |
In subctl verify we deploy connectivity test pods on GW node for some tests and on non-GW nodes for other tests, Could you label(submariner.io/gateway=true) only a single node as GW and rerun subctl verify ? |
Hello @yboaron, Do you mean that if more than one node is labelled with $ kubectl label --list nodes --all --context domain-2 | grep submariner
submariner.io/gateway=true
submariner.io/gateway=true
submariner.io/gateway=true
$ kubectl label --list nodes --all --context domain-3 | grep submariner
submariner.io/gateway=true
submariner.io/gateway=true
submariner.io/gateway=true |
Since you labelled all 3 nodes as GWs on both clusters , tests that need to run one of the test pods on non-GW node will fail [1] while tests that need to run client pod and listener pod on GW will succeed [2] . Can you label(submariner.io/gateway=true) only a single node as GW on each cluster and rerun subctl verify ? [1] �[38;5;9m[FAILED] Failed to await pod ready. Pod "tcp-check-listenermxz66" is still pending: status: [2] �[1mSTEP:�[0m Verifying that the listener got the connector's data and the connector got the listener's data �[38;5;243m@ 03/11/24 09:10:31.115�[0m |
Ok so I applied the label "submariner.io/gateway=true" on only one node for domain-2 and domain-3 clusters. But know I am confused because the domain-3 connection is down : subctl show connections
Cluster "e2e-mgmt"
⚠ Submariner connectivity feature is not installed
Cluster "domain-2"
✓ Showing Connections
GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg.
harry domain-3 172.16.100.84 no libreswan 242.1.0.0/16 connecting 0s
Cluster "domain-3"
✗ Showing Connections
✗ No connections found So I reinstalled Submariner on domain-3 but I got the same problem. subctl uninstall --context domain-3
subctl join broker-info.subm --clustercidr "10.42.0.0/16" --globalnet --clusterid domain-3 --context domain-3 Every pod seems fine : kubectl get pods -n submariner-operator --context domain-3
NAME READY STATUS RESTARTS AGE
submariner-gateway-qhq6s 1/1 Running 0 25m
submariner-globalnet-b46h6 1/1 Running 0 25m
submariner-lighthouse-agent-749f576cd9-t87fw 1/1 Running 0 25m
submariner-lighthouse-coredns-86b594f7cd-fptx7 1/1 Running 0 25m
submariner-lighthouse-coredns-86b594f7cd-qgz7m 1/1 Running 0 25m
submariner-metrics-proxy-qfjlv 2/2 Running 0 25m
submariner-operator-7994fc86c5-w95w8 1/1 Running 0 26m
submariner-routeagent-6jv4v 1/1 Running 0 25m
submariner-routeagent-bkbsk 1/1 Running 0 25m
submariner-routeagent-dj7c5 1/1 Running 0 25m But the gateway is showing an error in logs ( # ...
2024-03-11T15:22:20.397Z ERR ..gine/cableengine.go:147 CableEngine Error installing cable for &natdiscovery.NATEndpointInfo{Endpoint:v1.Endpoint{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"domain-2-submariner-cable-domain-2-172-16-100-81", GenerateName:"", Namespace:"submariner-operator", SelfLink:"", UID:"57c8b6c2-6bdb-411b-8eb9-dcf6282515a1", ResourceVersion:"3416393", Generation:1, CreationTimestamp:time.Date(2024, time.March, 11, 14, 55, 19, 0, time.Local), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"submariner-io/clusterID":"domain-2"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:"submariner-gateway", Operation:"Update", APIVersion:"submariner.io/v1", Time:time.Date(2024, time.March, 11, 14, 55, 19, 0, time.Local), FieldsType:"FieldsV1", FieldsV1:(*v1.FieldsV1)(0xc000204558), Subresource:""}}}, Spec:v1.EndpointSpec{ClusterID:"domain-2", CableName:"submariner-cable-domain-2-172-16-100-81", HealthCheckIP:"242.0.255.254", Hostname:"porthos", Subnets:[]string{"242.0.0.0/16"}, PrivateIP:"172.16.100.81", PublicIP:"172.16.110.81", NATEnabled:true, Backend:"libreswan", BackendConfig:map[string]string{"natt-discovery-port":"4490", "preferred-server":"false", "public-ip":"ipv4:172.16.110.81", "udp-port":"4500"}}}, UseNAT:false, UseIP:"172.16.100.81"} error="error installing Endpoint cable \"submariner-cable-domain-2-172-16-100-81\": error whacking with args [--psk --encrypt --name submariner-cable-domain-2-172-16-100-81-0-0 --id 172.16.100.84 --host 172.16.100.84 --client 242.1.0.0/16 --ikeport 4500 --to --id 172.16.100.81 --host 172.16.100.81 --client 242.0.0.0/16 --ikeport 4500 --dpdaction=hold --dpddelay 30]: exit status 20" |
Could you please reinstall submariner on both clusters and in case you still hit the connection issue BTW, I can see that Submariner detects the CNI as generic and not flannel , do you have daemonset named flannel in kube-system NS ? do you have volume named flannel in this daemonset ? |
Hello @yboaron Sorry for the late answer !
$ subctl show connections
Cluster "domain-3"
✓ Showing Connections
GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg.
porthos domain-2 172.16.100.81 no wireguard 242.0.0.0/16 connected 1.430896ms
Cluster "e2e-mgmt"
⚠ Submariner connectivity feature is not installed
Cluster "domain-2"
✓ Showing Connections
GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg.
harry domain-3 172.16.100.84 no wireguard 242.1.0.0/16 connected 1.006687ms But in the end, the tests were not successful : Summarizing 10 Failures:
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod connects via TCP to the globalIP of a remote
service when the pod is not on a gateway and the remote service is not on a gateway [It] should have sent the expected data from the pod to
the other pod [dataplane, globalnet, basic]
github.com/submariner-io/[email protected]/test/e2e/framework/network_pods.go:196
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod connects via TCP to the globalIP of a remote
service when the pod is on a gateway and the remote service is not on a gateway [It] should have sent the expected data from the pod to the
other pod [dataplane, globalnet]
github.com/submariner-io/[email protected]/test/e2e/framework/network_pods.go:196
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod matching an egress IP namespace selector conn
ects via TCP to the globalIP of a remote service when the pod is not on a gateway and the remote service is not on a gateway [It] should hav
e sent the expected data from the pod to the other pod [dataplane, globalnet]
github.com/submariner-io/[email protected]/test/e2e/framework/network_pods.go:196
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod matching an egress IP namespace selector conn
ects via TCP to the globalIP of a remote service when the pod is on a gateway and the remote service is on a gateway [It] should have sent t
he expected data from the pod to the other pod [dataplane, globalnet]
github.com/submariner-io/[email protected]/test/e2e/framework/dataplane.go:200
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod matching an egress IP pod selector connects v
ia TCP to the globalIP of a remote service when the pod is not on a gateway and the remote service is not on a gateway [It] should have sent
the expected data from the pod to the other pod [dataplane, globalnet]
github.com/submariner-io/[email protected]/test/e2e/framework/network_pods.go:196
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod matching an egress IP pod selector connects v
ia TCP to the globalIP of a remote service when the pod is on a gateway and the remote service is on a gateway [It] should have sent the exp
ected data from the pod to the other pod [dataplane, globalnet]
github.com/submariner-io/[email protected]/test/e2e/framework/dataplane.go:200
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod with HostNetworking connects via TCP to the g
lobalIP of a remote service when the pod is not on a gateway and the remote service is not on a gateway [It] should have sent the expected d
ata from the pod to the other pod [dataplane, globalnet]
github.com/submariner-io/[email protected]/test/e2e/framework/network_pods.go:196
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod with HostNetworking connects via TCP to the g
lobalIP of a remote service when the pod is on a gateway and the remote service is not on a gateway [It] should have sent the expected data
from the pod to the other pod [dataplane, globalnet]
github.com/submariner-io/[email protected]/test/e2e/framework/network_pods.go:196
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod with HostNetworking connects via TCP to the g
lobalIP of a remote service when the pod is on a gateway and the remote service is not on a gateway [It] should have sent the expected data
from the pod to the other pod [dataplane, globalnet]
github.com/submariner-io/[email protected]/test/e2e/framework/network_pods.go:196
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod connects via TCP to the globalIP of a remote
headless service when the pod is not on a gateway and the remote service is not on a gateway [It] should have sent the expected data from th
e pod to the other pod [dataplane, globalnet]
github.com/submariner-io/[email protected]/test/e2e/framework/network_pods.go:196
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod connects via TCP to the globalIP of a remote
service in reverse direction when the pod is not on a gateway and the remote service is not on a gateway [It] should have sent the expected
data from the pod to the other pod [dataplane, globalnet]
github.com/submariner-io/[email protected]/test/e2e/framework/network_pods.go:196
Ran 13 of 47 Specs in 1888.586 seconds
FAIL! -- 3 Passed | 10 Failed | 0 Pending | 34 Skipped Here is the complete file : I think there is something important to note here : K3S supports different backends for Flannel, so I configure it with Wireguard to encrypt the inter-node communications.
Q: can you confirm that wireguard as the cable driver is not supposed to change/imply more modifications during the installation/runtime of Submariner than
Indeed you are right, the cni is detected as $ subctl join broker-info.subm --clustercidr "10.42.0.0/16" --globalnet --cable-driver wireguard --clusterid domain-2 --context domain-2
✓ broker-info.subm indicates broker is at https://172.16.100.99:6443
✓ Discovering network details
Network plugin: generic # <--- HERE
Service CIDRs: [10.43.0.0/16]
Cluster CIDRs: []
There are 1 node(s) labeled as gateways:
- porthos But in K3S there is no daemonset created for Flannel. Is Submariner supposed to base this detection on a deamonset ? |
To enable WireGuard cable driver, you need to install it on GW node (search WireGuard here )
Yep, but we can workaround pod CIDR discovery by specifying --clustercidr flag in join command (as you did). Can you upload subctl gather and subctl diagnose all from both clusters ? BTW, did you install clusters with overlapping CIDRs on purpose? |
Ok so I made some tests following your recommendations :
subctl deploy-broker --context e2e-mgmt
subctl join broker-info.subm --clustercidr "10.44.0.0/16" --servicecidr "10.45.0.0/16" --cable-driver wireguard --clusterid domain-2 --context domain-2
subctl join broker-info.subm --clustercidr "10.46.0.0/16" --servicecidr "10.47.0.0/16" --cable-driver wireguard --clusterid domain-3 --context domain-3 But again, the tests fail. Here are the results of diagnose-202403121620.log |
Sorry for the late answer, I checked the logs, looks fine, Wireguard connection is UP between clusters, Since the WireGuard connection is up, I would expect at least the tests between pod@gw_node in domain2 and pod@gw_node in domain3 to pass. What is the latest output of subctl verify --context <cluster1_context > --tocontext < cluster2_context> --only connectivity --verbose ? Can you also check subctl diagnose firewall intra-cluster --kubeconfig < cluster_kubeconfig> ? |
Hello @yboaron, yes I agree that the combination of the three is suspicious. I will execute the tests you ask and I was thinking about making the same tests with Cilium instead of Flannel, just in case. Unfortunately, due to the Kubecon EU, I will not be able to test these, this week. Thanks again ! |
Hello @yboaron ! Regarding this issue, I had to deploy the clusters from scratch using Cilium, but it is still failing.
I deployed the submariner broker as usual. # labels and annotations
kubectl label node porthos "submariner.io/gateway=true" --context domain-1
kubectl label node harry "submariner.io/gateway=true" --context domain-2 Joining subctl join broker-info.subm --clustercidr "10.10.0.0/16" --servicecidr "10.11.0.0/16" --cable-driver wireguard --clusterid domain-1 --context domain-1
subctl join broker-info.subm --clustercidr "10.12.0.0/16" --servicecidr "10.13.0.0/16" --cable-driver wireguard --clusterid domain-2 --context domain-2 I deployed this in ---
apiVersion: apps/v1
kind: Deployment
metadata:
name: rebel-base
spec:
selector:
matchLabels:
name: rebel-base
replicas: 1
template:
metadata:
labels:
name: rebel-base
spec:
containers:
- name: rebel-base
image: docker.io/nginx:1.15.8
ports:
- containerPort: 80
name: http
volumeMounts:
- name: html
mountPath: /usr/share/nginx/html/
volumes:
- name: html
configMap:
name: rebel-base-response
items:
- key: message
path: index.html
---
apiVersion: v1
kind: ConfigMap
metadata:
name: rebel-base-response
data:
message: "hello federation-1 from domain-2\n"
---
apiVersion: v1
kind: Service
metadata:
name: rebel-base-svc
spec:
type: ClusterIP
ports:
- name: http
protocol: TCP
port: 80
targetPort: http
selector:
name: rebel-base Now I export the service : subctl export service rebel-base-svc -n federation-1 --context domain-2
kubectl get serviceimport -n federation-1 --context domain-1
NAME TYPE IP AGE
rebel-base-svc ClusterSetIP 96s Local requests work ( $ kubectl run x-wing --rm -it --image nicolaka/netshoot --context domain-1 -- \
curl rebel-base-svc.federation-1.svc.clusterset.local.
hello federation-1 from domain-2 But requests from $ kubectl run x-wing --rm -it --image nicolaka/netshoot --context domain-1 -- \
curl rebel-base-svc.federation-1.svc.clusterset.local. --connect-timeout 3
curl: (28) Failed to connect to rebel-base-svc.federation-1.svc.clusterset.local. port 80 after 3002 ms: Timeout was reached This time, I used Cilium Hubble to monitor the network. I found something strange :
Observing the pod Apr 9 13:35:59.989: default/x-wing:56847 (ID:84471) -> kube-system/coredns-6799fbcd5-vgsnf:53 (ID:110854) to-endpoint FORWARDED (UDP)
Apr 9 13:35:59.991: default/x-wing:56847 (ID:84471) <- kube-system/coredns-6799fbcd5-vgsnf:53 (ID:110854) to-endpoint FORWARDED (UDP)
Apr 9 13:35:59.992: default/x-wing:48924 (ID:84471) -> 10.13.178.142:80 (world) to-stack FORWARDED (TCP Flags: SYN)
# ends here
kubectl get svc -n federation-1 --context domain-2 -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
rebel-base-svc ClusterIP 10.13.178.142 <none> 80/TCP 44m name=rebel-base Observing the pod Apr 9 13:36:28.812: 10.10.2.95:40152 (world) -> federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) to-endpoint FORWARDED (TCP Flags: SYN)
Apr 9 13:36:28.812: 10.10.2.95:40152 (world) <- federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) to-stack FORWARDED (TCP Flags: SYN, ACK)
Apr 9 13:36:28.812: federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) <> 10.10.2.95:40152 (world) Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
Apr 9 13:36:29.041: 10.10.2.95:40152 (world) <> federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) to-overlay FORWARDED (TCP Flags: SYN)
Apr 9 13:36:29.826: federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) <> 10.10.2.95:40152 (world) Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
Apr 9 13:36:29.842: federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) <> 10.10.2.95:40152 (world) Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
Apr 9 13:36:30.071: 10.10.2.95:40152 (world) <> federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) to-overlay FORWARDED (TCP Flags: SYN)
Apr 9 13:36:31.842: federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) <> 10.10.2.95:40152 (world) Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
Apr 9 13:36:36.066: 10.10.2.95:40152 (world) <- federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) to-stack FORWARDED (TCP Flags: SYN, ACK)
Apr 9 13:36:36.067: federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) <> 10.10.2.95:40152 (world) Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
Apr 9 13:36:44.259: 10.10.2.95:40152 (world) <- federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) to-stack FORWARDED (TCP Flags: SYN, ACK)
Apr 9 13:36:44.259: federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) <> 10.10.2.95:40152 (world) Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
Apr 9 13:37:00.386: 10.10.2.95:40152 (world) <- federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) to-stack FORWARDED (TCP Flags: SYN, ACK)
Apr 9 13:37:00.386: federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) <> 10.10.2.95:40152 (world) Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK) You can notice the errors messages kubectl get pod -n default --context domain-1 -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
x-wing 1/1 Running 0 46m 10.10.2.95 athos <none> <none> I have no idea why the packets are not correctly routed in both sides. |
Hi @IceManGreen , A. We have the following configuration K3S, Cilium as the CNI and cable driver is wireguard. B. You can resolve domain2/rebel-base-svc IP from domain1, which means that submariner inter-cluster wireguard tunnels are up and multi-cluster service discovery looks fine. C. So, we need to understand why SYN/ACK packet is being dropped, It could be that Submariner failed to detect the CNI network interface and updates rp_filter(Submariner looks for a network interface with an IP address from the clustercidr range), and packets are dropped by the kernel or some firewall/infra SG that blocks inter-cluster traffic. D.
|
Hi @yboaron Thanks for the summary ! Here is the result of According to your words about the CNI detection by Submariner, I ran the following test : $ subctl diagnose cni --context domain-1
⚠ Checking Submariner support for the CNI network plugin
⚠ Submariner could not detect the CNI network plugin and is using ("generic") plugin. It may or may not work. Indeed, Submariner does not detect Cilium as the cluster CNI. Same thing in domain-2. Do I have a way to force Submariner to consider the cluster's CNI as Cilium ? |
Hi @IceManGreen ,
In your case Submariner failed to detect Cilium and used "generic" CNI, Submariner refers to generic CNI as a kube-proxy/iptables based CNI.
10.10.0.0/16 via 240.19.112.84 dev vx-submariner proto static So, packet should be routed by Submariner according to in this case kernel (Longest prefix match) will choose to route the packet wrongly using 10.10.2.0/24 via 10.12.1.86 dev cilium_host proto kernel src 10.12.1.86 mtu 1370 and not via vx-submariner. I can see similar routes also on domain1 cluster, for example : ip route show You should first address these routes issue |
Hi @IceManGreen , any update? can we close this issue? |
Hello @yboaron , sorry for the very late answer too !
Yes it is ! Regarding the rest of your answer, thank you so much for guiding me through so much details, it helps a lot. Thanks again for your help ! I think we are good with this issue ! |
@IceManGreen , I'm going to close this topic, feel free to reopen it if it's still relevant |
@yboaron @IceManGreen I have encountered the same problem in the flannel environment. The root cause of the problem is that the MAC address of the vx-submariner network card created by agent-route on each node is the same, resulting in the fact that the service of another cluster can only be accessed from the gateway node, because the network from the non-gateway node to the gateway node is not accessible (through the vx-submariner network card). so l will read the agent-route code of create interface named vx-submariner. |
@yboaron l think need to reopen this issue. |
@huangjiasingle OK, let's reopen it. could you please specify what issue you encountered? also elaborate on your environment (CNI, platform, Submariner version, cable driver, etc) |
@yboaron my env:
|
btw, l read the route agent code of create vx-submariner interface. it's doesn't define the mac when create the vx-submariner interface. so l want kown who set the mac addr of vx-submariner. |
Hi @huangjiasingle , thanks for reaching out. This looks like a data path issue that needs further investigation, please attach |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further |
Looks like this issue has concluded. Feel free to re-open or open another if more discussion is needed. |
Hello everyone,
I have an issue with a Submariner usecase where I want to separate the control-plane network (used by the broker to communicate with the participating clusters) and the data plane network (used by participating clusters to connect my applications through the gateways).
I have deployed 3 clusters :
e2e-mgmt
domain-2
anddomain-3
Note that each deployment clusters have 3 nodes.
Each node has 2 interfaces :
The nodes are actually virtual machines. Some tests showed that each VM can reach the other
communicating through the control-plane or the data-plane (ex: from
172.16.100.10
to172.16.100.11
but not from172.16.100.10
to172.16.110.11
).I labeled and annotated all my nodes in
domain-2
anddomain-3
like :Because I want the VPN tunnels to communicate through
172.16.110.0/24
(data plane) even if the Kubernetes APIs are listening to172.16.100.0/24
(control plane).I created the broker and joined the clusters using :
Indeed, the clusters joined the broker :
The connections seem good :
Or maybe the remote IPs should be on
172.16.110.0/24
?The gateways seem good :
I created a Nginx service in
domain-2
and namespacehello-domain-2
calledhello-world-svc
.Using netshoot, I can tell that the requests are working locally from
domain-2
I exported the service and
domain-3
consumed properly theServiceImport
from the Broker :However, the same test with netshoot from
domain-3
does not work :Even if
dig
resolves it properly :What did I do wrong ?
The text was updated successfully, but these errors were encountered: