-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabled DNS as the service discovery mechanism for agent => collector communication #333
Enabled DNS as the service discovery mechanism for agent => collector communication #333
Conversation
Codecov Report
@@ Coverage Diff @@
## master #333 +/- ##
==========================================
+ Coverage 89.73% 89.75% +0.02%
==========================================
Files 64 64
Lines 2991 2997 +6
==========================================
+ Hits 2684 2690 +6
Misses 207 207
Partials 100 100
Continue to review full report at Codecov.
|
dc65359
to
ff3856e
Compare
Tested this manually by using the # setup an elasticsearch with `make es`
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: simple-prod
spec:
strategy: production
collector:
replicas: 2
storage:
type: elasticsearch
options:
es:
server-urls: http://elasticsearch:9200 Then, hit the Jaeger UI a few times and queried each collector individually: $ kubectl get pods
NAME READY STATUS RESTARTS AGE
elasticsearch-0 1/1 Running 0 29m
simple-prod-collector-bf94c5d9f-6qhr8 1/1 Running 0 28m
simple-prod-collector-bf94c5d9f-pzkkp 1/1 Running 0 27m
simple-prod-query-695f65d7f-77st7 2/2 Running 0 13m
$ kubectl port-forward simple-prod-collector-bf94c5d9f-pzkkp 14268:14268
$ curl localhost:14268/metrics | grep jaeger_collector_spans_received_total | grep jaeger-query
jaeger_collector_spans_received_total{debug="false",format="jaeger",svc="jaeger-query"} 30
$ kubectl port-forward simple-prod-collector-bf94c5d9f-6qhr8 14268:14268
$ curl localhost:14268/metrics | grep jaeger_collector_spans_received_total | grep jaeger-query
jaeger_collector_spans_received_total{debug="false",format="jaeger",svc="jaeger-query"} 41 |
E2E tests are passing: $ make test
Running unit tests...
? github.com/jaegertracing/jaeger-operator/cmd [no test files]
? github.com/jaegertracing/jaeger-operator/cmd/manager [no test files]
ok github.com/jaegertracing/jaeger-operator/pkg/account 0.023s coverage: 100.0% of statements
? github.com/jaegertracing/jaeger-operator/pkg/apis [no test files]
ok github.com/jaegertracing/jaeger-operator/pkg/apis/io/v1alpha1 0.020s coverage: 15.7% of statements
ok github.com/jaegertracing/jaeger-operator/pkg/apis/jaegertracing/v1 0.025s coverage: 16.6% of statements
? github.com/jaegertracing/jaeger-operator/pkg/cmd/start [no test files]
? github.com/jaegertracing/jaeger-operator/pkg/cmd/version [no test files]
ok github.com/jaegertracing/jaeger-operator/pkg/config/sampling 0.025s coverage: 94.1% of statements
ok github.com/jaegertracing/jaeger-operator/pkg/config/ui 0.014s coverage: 94.1% of statements
? github.com/jaegertracing/jaeger-operator/pkg/controller [no test files]
? github.com/jaegertracing/jaeger-operator/pkg/controller/deployment [no test files]
ok github.com/jaegertracing/jaeger-operator/pkg/controller/jaeger 2.237s coverage: 70.5% of statements
? github.com/jaegertracing/jaeger-operator/pkg/controller/legacy [no test files]
ok github.com/jaegertracing/jaeger-operator/pkg/cronjob 0.022s coverage: 97.0% of statements
ok github.com/jaegertracing/jaeger-operator/pkg/deployment 0.027s coverage: 100.0% of statements
ok github.com/jaegertracing/jaeger-operator/pkg/ingress 0.020s coverage: 100.0% of statements
ok github.com/jaegertracing/jaeger-operator/pkg/inject 0.015s coverage: 100.0% of statements
ok github.com/jaegertracing/jaeger-operator/pkg/inventory 0.017s coverage: 91.8% of statements
ok github.com/jaegertracing/jaeger-operator/pkg/route 0.052s coverage: 100.0% of statements
ok github.com/jaegertracing/jaeger-operator/pkg/service 0.013s coverage: 100.0% of statements
ok github.com/jaegertracing/jaeger-operator/pkg/storage 0.620s coverage: 94.0% of statements
? github.com/jaegertracing/jaeger-operator/pkg/storage/elasticsearch/v1alpha1 [no test files]
ok github.com/jaegertracing/jaeger-operator/pkg/strategy 0.083s coverage: 89.5% of statements
ok github.com/jaegertracing/jaeger-operator/pkg/util 0.005s coverage: 100.0% of statements
? github.com/jaegertracing/jaeger-operator/pkg/version [no test files]
Formatting code...
Building...
Sending build context to Docker daemon 159MB
Step 1/6 : FROM centos
---> 1e1148e4cc2c
Step 2/6 : RUN INSTALL_PKGS=" openssl " && yum install -y $INSTALL_PKGS && rpm -V $INSTALL_PKGS && yum clean all && mkdir /tmp/_working_dir && chmod og+w /tmp/_working_dir
---> Using cache
---> 82728e7bfc7b
Step 3/6 : COPY scripts/* /scripts/
---> Using cache
---> 80a00c2ba02b
Step 4/6 : USER nobody
---> Using cache
---> f576eac46724
Step 5/6 : ADD build/_output/bin/jaeger-operator /usr/local/bin/jaeger-operator
---> 3e0d485fc23c
Step 6/6 : ENTRYPOINT ["/usr/local/bin/jaeger-operator"]
---> Running in be69a78e5860
Removing intermediate container be69a78e5860
---> 91ae6d3d07a6
Successfully built 91ae6d3d07a6
Successfully tagged jpkroehling/jaeger-operator:latest
Pushing image jpkroehling/jaeger-operator:latest...
Running Smoke end-to-end tests...
ok github.com/jaegertracing/jaeger-operator/test/e2e 196.789s
Creating namespace default
service/cassandra created
statefulset.apps/cassandra created
Running Cassandra end-to-end tests...
ok github.com/jaegertracing/jaeger-operator/test/e2e 136.221s
Running Elasticsearch end-to-end tests...
ok github.com/jaegertracing/jaeger-operator/test/e2e 206.634s
|
@jpkrohling Before merging, do we want to make this change conditional on a property in the CR? So if headless services, then |
@jpkrohling does this change work, If I change the reporter type to |
@objectiser there is never It would be worth to test the behavior with non-headless collector service. |
No. |
I think we can merge this and keep discussing about the possible load balancing alternatives + headless services. Instead of changing the headless service, we could perhaps provide an extra service. |
@jpkrohling sure |
I mean, if user supplies |
@jpkrohling saw your message on IRC - just wondering if, before considering multiple collector services (headless and non-headless), whether it would first be worth testing just having non-headless collector and revert the |
Just tried it out and it does not load balance: I got 138 spans reported to one collector and 0 to the second. The code is basically master + this diff: diff --git a/pkg/inventory/service.go b/pkg/inventory/service.go
index 6b45e62..d257d5d 100644
--- a/pkg/inventory/service.go
+++ b/pkg/inventory/service.go
@@ -21,6 +21,11 @@ func ForServices(existing []v1.Service, desired []v1.Service) Service {
if t, ok := mdelete[k]; ok {
tp := t.DeepCopy()
+ // we keep the ClusterIP that got assigned by the cluster, if it's empty in the "desired" and not empty on the "current"
+ if v.Spec.ClusterIP == "" && len(tp.Spec.ClusterIP) > 0 {
+ v.Spec.ClusterIP = tp.Spec.ClusterIP
+ }
+
// we can't blindly DeepCopyInto, so, we select what we bring from the new to the old object
tp.Spec = v.Spec
tp.ObjectMeta.OwnerReferences = v.ObjectMeta.OwnerReferences
diff --git a/pkg/service/collector.go b/pkg/service/collector.go
index c8ccbd3..c386b47 100644
--- a/pkg/service/collector.go
+++ b/pkg/service/collector.go
@@ -41,7 +41,7 @@ func NewCollectorService(jaeger *v1.Jaeger, selector map[string]string) *corev1.
},
Spec: corev1.ServiceSpec{
Selector: selector,
- ClusterIP: "None",
+ ClusterIP: "",
Ports: []corev1.ServicePort{
{
Name: "zipkin",
|
In summary: it looks like the only combination where we can get load balancing by default, without efforts by the user, is the current code in this PR. @jkandasa had a great question about what happens when the user attempts to override the reporter type. I'm currently testing that and this will likely result in changes to this PR. |
@jpkrohling pity - so will you be adding the second collector service (so have headless and non-headless) as part of this PR? We also need to make the query service non-headless - separate PR or this? |
As per #264, I think we might not actually need a non-headless service for the collector, only for query, which might not need to be a headless service. In any case, I'll handle the query in another PR. |
@jpkrohling We may still need a non-headless service for collector for the tracers when using HTTPSender? We would want the tracers to be load balanced across collectors. |
… communication Signed-off-by: Juraci Paixão Kröhling <[email protected]>
Signed-off-by: Juraci Paixão Kröhling <[email protected]>
ff3856e
to
bd1518f
Compare
I just tried changing the PR to include a new service for the collector, so that we'd have two services. It has no effect, possibly because OkHttp will keep the TCP connections open, making it always land (preferably) at the same server. Here's a gist with more details: https://gist.github.com/jpkrohling/406617419fa8de47910c832c44e5e2be The application used for the test is the Here's the branch I used for the test: I understand the scope of this PR here is about the agent => collector communication. I just fixed the issue that @jkandasa reported, so, I believe this is now ready for a final review + merge. |
@jpkrohling Yes a persistent connection would be used, so we need to test that solution with multiple app instances. But you are right, it is not necessary for this PR. |
The example application has actually two tracers, and it does show that spans from one tracer instance (order) lands at one collector and the other instance (inventory) in the second collector. The third collector received zero spans as a result. |
Ok, so the two service approach look like it would work - so we just need to decide on service names (but possibly still better in separate PR). |
Signed-off-by: Juraci Paixão Kröhling <[email protected]>
E2E tests are passing:
|
Closes #332
Signed-off-by: Juraci Paixão Kröhling [email protected]