Skip to content
This repository has been archived by the owner on Jul 16, 2019. It is now read-only.

Support using Traefik as a Service Mesh with Service Fabric #43

Open
lawrencegripper opened this issue Apr 10, 2018 · 2 comments
Open
Assignees
Labels
enhancement issues relating to enhancements provider issues relating to the Traefik provider size/medium medium tasks

Comments

@lawrencegripper
Copy link
Collaborator

The aim would be to allow services to use a label like traefik.servicefabric.enable-mesh which would publish a service on an internal endpoint.

Inter-service communications can then benefit from features of traefik such as circuit-breakers, retry, rate-limiting etc.

Tasks:

  • update to add support for this label
  • add support for adaptive weighting to prefer routing to services on the local node over remove nodes
  • test using this approach in a large cluster
@lawrencegripper lawrencegripper added enhancement issues relating to enhancements size/medium medium tasks labels Apr 10, 2018
@lawrencegripper lawrencegripper self-assigned this Apr 10, 2018
@jjcollinge jjcollinge added the provider issues relating to the Traefik provider label Apr 14, 2018
@lawrencegripper
Copy link
Collaborator Author

lawrencegripper commented Apr 20, 2018

Proposal

Create an additional label traefik.servicefabric.mesh. When this label is added to a service it would be added to the mesh endpoint defined in your traefik.toml the default template would be updated to include this endpoint too.

All labels set on the service would then control the behavior of the service in the mesh so existing labels for circuitbreaker etc would work internally.

As a stretch we'd look to add an additional label of traefik.servicefabric.preferlocal which would preference routing to a local service so, when used with mesh, you wouldn't go off node unless necessary.

# Entrypoints definition
#
# Optional
# Default:
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.traefik]
address = ":8080"
[entryPoints.mesh]
address = ":7887"

@AviMualem I know we chatted about this a bit earlier - does this sound like a good plan for you?

@AviMualem
Copy link

AviMualem commented Apr 25, 2018

Hey @lawrencegripper ,
First of all let me start by saying i truly believe this can be an amazing functionality to all service fabric users.

Just to add some background, before i started to think about traefik as a service mesh i started to think about the concept of using Traefik as in internal reverse proxy for my internal service to service communication after i noticed that when using windows as the OS of the cluster nodes the integrated reverse proxy is a thin and basic application which is far from being a fine grained reverse proxy which looks legit for production use cases.

On Linux based node deployment, the integrated reverse proxy is Envoy (which i find highly similar to Traefik) which is way different than the windows based reverse proxy app.
I know from various sources that a work is being made in order to include Envoy is windows based deployment as well.
As a personal note, Upon availability of both Traefik and Envoy i will probably choose Traefik for various reasons :).

After i examined all of the features offered from Traefik which include circuit breaker, rate limit, max connections, authorization,letsencrypt support, access log, retry policy and more..
I realized its way more than a simple proxy that make a URL manipulation, and it has a lot functionalities i want in my service mesh layer.

Before going to implementation and thinking about Lables and entry points I would start with checking with the Treaefik team what is their opinion regarding using Traefik as a service mesh, because after reading their documentation and large portion of the code it looks like they are defining Traefik more as a modern reverse proxy which handles communication from the external world into the backend implementation.

I cant see any samples or blogs talking about using it for internal service to service communication although its really easy to achieve it due to the fact we already have service discovery so whats left is just to have an entry point with a port which is not exposed to the external world.

In the rules identified associated with the internal endpoint i can easily get rate limiting, circuit breaker, retries and more for my internal service to service communication... which are without a doubt a mesh layer responsibility.

I even took the time to check it in my DEV cluster with 50+ micro services deployed to it and it looked fine. 40 of them was connected to the internal entry point and 10 were exposed to the external world in and external entry point which i exposed to the external world.

Now when it comes to mesh solutions there are a lot of discussions around the division between the data planes and the control planes (more info can be shown here -

Treafik is often correlated with the data planes features...in my opinion some of the control planes features too.

Projects like Istio (https://istio.io/) have a a real separation between the control and the data plane.
Essentially Istio makes use in Envoy as a data plane and have implementation for the control plane.

As far as it looks to me, it looks legit to use it as mesh because as i mentioned it looks like it includes a lot of out of the box functionality besides routing and classic load balancing, and on top of that at the end of the day Service fabric linux based deployment are are using Envoy in the same manner.

In my opinion it might not have the flexibility of stuff like Istio but its can be still used as a mesh layer.

Now, regarding the fact you consider to have an option that will prefer stay on the same node in service to service communication (if both service are deployed in the same host) im not sure it the exact behavior we want to achieve. if the node is under a lot of load maybe its better to route the traffic to another node which host the desired service.

I would be happy to see your opinion and @jjcollinge opinion as well :) this development should be really precise because if customers will use it as a mesh changing the design will be a hard task :)

Avi.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement issues relating to enhancements provider issues relating to the Traefik provider size/medium medium tasks
Projects
None yet
Development

No branches or pull requests

3 participants