Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to direct connect to broker without proxy #423

Open
youzipi opened this issue Jan 5, 2024 · 10 comments
Open

how to direct connect to broker without proxy #423

youzipi opened this issue Jan 5, 2024 · 10 comments

Comments

@youzipi
Copy link

youzipi commented Jan 5, 2024

i would prefer not to use a proxy.
but i found broker does not have the ingress template.


for now, i deploy an ingress for broker individually.

@lhotari
Copy link
Member

lhotari commented Jan 17, 2024

Ingress probably wouldn't make sense for Pulsar brokers, at least for the binary protocol. For the Pulsar Admin API that would be a feasible approach. The http/https protocol could also be used for topic lookups, so it would be sufficient to be used as the "serviceUrl". However, the Pulsar binary protocol would require a different approach.

You could use k8s node ports and Pulsar's "advertisedListeners" feature:
https://pulsar.apache.org/docs/3.1.x/concepts-multiple-advertised-listeners/#advertised-listeners
However, configuring that would require some special customization and integration to make it work with a Pulsar k8s deployment.

Another possibility is the SNI proxy feature and use a proxy that supports SNI proxying (for example Apache Traffic server or Nginx):
https://pulsar.apache.org/docs/3.1.x/concepts-proxy-sni-routing/

@lhotari
Copy link
Member

lhotari commented Jan 19, 2024

It would make sense to have a load balancer for the broker service that is used for lookups since the binary protocol is more efficient than using the REST API for lookups. The individual brokers need to be addressable directly and solving that requires a solution. I'd like to see an experiment for the nodeport + advertisedListeners solution. I guess that would be feasible in cloud managed k8s environments where it is possible to expose a k8s node with a routable address that the client could access.

@lhotari
Copy link
Member

lhotari commented Jan 19, 2024

One problem with Pulsar Proxy is that it adds multiple cross AZ hops which incur network transfer costs in cloud k8s environments.

@lhotari
Copy link
Member

lhotari commented Jan 19, 2024

Adding some more context here about the Pulsar Proxy.

https://pulsar.apache.org/docs/3.1.x/administration-proxy/
"Pulsar proxy is used when direct connections between clients and Pulsar brokers are either infeasible or undesirable"

For the "undesirable" part:
At least in the past, some companies have had network security policies which emphasize network perimeter security with
reference architectures where there must be a minimal proxy component for inbound network traffic that has minimal access to any other components and it is placed in a DMZ between 2 firewalls. Many companies still have such security policies in place.

When the Apache Pulsar PMC was handling the Pulsar Proxy security vulnerability https://pulsar.apache.org/security/CVE-2022-24280/, it was decided to add a notice to https://pulsar.apache.org/docs/3.1.x/administration-proxy/ that the Pulsar Proxy isn't designed to be exposed directly on the public internet:
"The Pulsar proxy is not intended to be exposed on the public internet. The security considerations in the current design expect network perimeter security. The requirement of network perimeter security can be achieved with private networks."

For the "infeasible" part:
This is probably about laziness. When something works, many don't care to optimize or improve the solution.
The Pulsar Proxy is very easy to deploy in k8s as we can see in the Apache Pulsar Helm Chart.

The direct connection to brokers could be achieved with advertisedListeners and nodeports. It would be great to have a solution where this could be automated. The nodeport solution would require that the node has a routable address from clients. Since individual brokers don't require stable names, it would be sufficient to be able to advertise the node IP and nodeport.

Lookups could use the REST API configured with an ingress. There is also the possibility to have a loadbalancer for brokers that is used for lookups since that would be more efficient.

Another reason for a proxy like component is for lookups and federating multiple broker clusters into a single large cluster from the client perspective. In Pulsar, there was a component called "pulsar-discovery". This was removed by apache/pulsar#12119 and there's discussion in apache/pulsar#15225 about restoring it.

@lhotari
Copy link
Member

lhotari commented Jan 19, 2024

Slightly related: The issue #437 describes a current problem with the headless broker service that should be addressed by adding a 2nd cluster ip service for lookups and making the headless broker service use publishNotReadyAddresses: true.

@meyerbro
Copy link

Hi @youzipi can you share your manifest for your special ingress created for broker? I need exactly that I think. Thank you

@youzipi
Copy link
Author

youzipi commented Sep 27, 2024

Hi @youzipi can you share your manifest for your special ingress created for broker? I need exactly that I think. Thank you
@meyerbro

actually, i use proxy now.

this is the ingress config i used that time.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  labels:
    app: pulsar
    cluster: pulsar
    environment: alpha
    component: broker
  annotations:
  name: "pulsar-broker"
  namespace: pulsar
spec:
  rules:
    - http:
        paths:
          - path: /pulsar($|/)(.*)
            pathType: ImplementationSpecific
            backend:
              service:
                name: "pulsar-broker"
                port:
                  number: 6650
          - path: /pulsar-web($|/)(.*)
            pathType: ImplementationSpecific
            backend:
              service:
                name: "pulsar-broker"
                port:
                  number: 8080
      host: $your-host-name
---

@meyerbro
Copy link

meyerbro commented Oct 3, 2024

Thanks @youzipi! How are you using the proxy now? Did you find a way to proxy everything through https and not binary tcp? Is it using SNI?

@youzipi
Copy link
Author

youzipi commented Oct 9, 2024

Thanks @youzipi! How are you using the proxy now? Did you find a way to proxy everything through https and not binary tcp? Is it using SNI?

@meyerbro

my configuration is based on 2024-01.
maybe the helm chart supports easier config now.

custom configs:

  • service for proxy
  • ingress for broker

tcp: host_name:6651 -> alb -> service(proxy, type=LoadBalancer)
http: host_name:8080/broker-admin -> ingress(broker) -> service(broker, type=ClusterIP, generated by helm)

---
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/alicloud-loadbalancer-address-type: intranet
    service.beta.kubernetes.io/alicloud-loadbalancer-id: lb-aaa
    service.beta.kubernetes.io/alicloud-loadbalancer-force-override-listeners: "true"
    service.beta.kubernetes.io/alicloud-loadbalancer-protocol-port: tcp:6651
    service.beta.kubernetes.io/alicloud-loadbalancer-network-type: vpc
  name: proxy-tcp
  namespace: pulsar-new
  labels:
    app: pulsar
    component: proxy
spec:
  type: LoadBalancer
  ports:
  - name: tcp
    targetPort: 6651
    port: 6651
    protocol: TCP
  selector:
    app: pulsar
    component: proxy
...
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  labels:
    app: pulsar
    cluster: pulsar
    environment: prod
    component: broker
  annotations:
    # nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/configuration-snippet: |
      rewrite ^/broker-admin/(.*)$ /$1 break;
  name: "pulsar-broker"
  namespace: pulsar-new
spec:
  rules:
    - http:
        paths:
          - path: /broker-admin
            pathType: Prefix
            backend:
              service:
                name: "pulsar-broker"
                port:
                  number: 8080
      host: $host-name
---

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants