Skip to content

Commit

Permalink
[agent] Add metrics to show connections status between agent and coll…
Browse files Browse the repository at this point in the history
…ectors (jaegertracing#2657)

* add metrics that show agent connection collector status

Signed-off-by: WalkerWang731 <[email protected]>

* update comment

Signed-off-by: WalkerWang731 <[email protected]>

* exec make fmt

Signed-off-by: WalkerWang731 <[email protected]>

* simplify function and add testing relevant code in the builder_test.go

Signed-off-by: WalkerWang731 <[email protected]>

* add comment in connect_metrics.go

Signed-off-by: WalkerWang731 <[email protected]>

* simplify code and changed use expvar to show target

Signed-off-by: WalkerWang731 <[email protected]>

* simplify code and changed use expvar to show target

Signed-off-by: WalkerWang731 <[email protected]>

* exec make fmt

Signed-off-by: WalkerWang731 <[email protected]>

* Fix collector panic due to sarama sdk returning nil error (jaegertracing#2654)

Signed-off-by: luhualin <[email protected]>

Co-authored-by: luhualin <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Fix flaky tbuffered server test (jaegertracing#2635)

* Fix flaky tbuffered server test

Signed-off-by: Pavel Kositsyn <[email protected]>

* Apply suggestions from code review - more readable comments

Co-authored-by: Yuri Shkuro <[email protected]>
Signed-off-by: Pavel Kositsyn <[email protected]>

Co-authored-by: Yuri Shkuro <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Add github actions for integration tests (jaegertracing#2649)

* Add github action for jaeger integration tests

Signed-off-by: Ashmita Bohara <[email protected]>

* Create separate workflow for each integration test

Signed-off-by: Ashmita Bohara <[email protected]>

* Feedbacks changes

Signed-off-by: Ashmita Bohara <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Clean-up GH action names (jaegertracing#2661)

Signed-off-by: WalkerWang731 <[email protected]>

* Fix for failures in badger integration tests (jaegertracing#2660)

Signed-off-by: Ashmita Bohara <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Add protogen validation test (jaegertracing#2662)

Signed-off-by: Ashmita Bohara <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Add github action for jaeger all-in-one image (jaegertracing#2663)

* Add github action for jaeger all-in-one image

Signed-off-by: Ashmita Bohara <[email protected]>

* Feedbacks changes

Signed-off-by: Ashmita Bohara <[email protected]>

* Feedbacks changes

Signed-off-by: Ashmita Bohara <[email protected]>

* Feedbacks changes

Signed-off-by: Ashmita Bohara <[email protected]>

* Feedbacks changes

Signed-off-by: Ashmita Bohara <[email protected]>

* Feedbacks changes

Signed-off-by: Ashmita Bohara <[email protected]>

* Make steps self-explantory

Signed-off-by: Ashmita Bohara <[email protected]>

* Fix git tags issue

Signed-off-by: Ashmita Bohara <[email protected]>

* Fix ES integration test

Signed-off-by: Ashmita Bohara <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Update comment that looks confusing during builds

Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Use GitHub actions based build badges

Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Fix and minor improvements to all-in-one github action (jaegertracing#2667)

Signed-off-by: Ashmita Bohara <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Fix docker login issue with all-in-one build (jaegertracing#2668)

* Fix docker login issue with all-in-one build

Signed-off-by: Ashmita Bohara <[email protected]>

* Fix docker login issue with all-in-one build

Signed-off-by: Ashmita Bohara <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Fix issue with all-in-one build (jaegertracing#2669)

Signed-off-by: Ashmita Bohara <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Update cmd/agent/app/reporter/connect_metrics.go

accept suggestions

Co-authored-by: Yuri Shkuro <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Update cmd/agent/app/reporter/connect_metrics.go

accept suggestions

Co-authored-by: Yuri Shkuro <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* simplify the code that remove ConnectMetricsParams{} and integrate ConnectMetrics{}

Signed-off-by: WalkerWang731 <[email protected]>

* simplify the code that remove ConnectMetricsParams{} and integrate ConnectMetrics{}

Signed-off-by: WalkerWang731 <[email protected]>

* merage from the lastest master branch and exec make fmt

Signed-off-by: walker.wangxy <[email protected]>

* add comment on ConnectMetrics

Signed-off-by: WalkerWang731 <[email protected]>

* clear up redundant codes

Signed-off-by: WalkerWang731 <[email protected]>

Co-authored-by: WalkerWang731 <[email protected]>
Co-authored-by: Betula-L <[email protected]>
Co-authored-by: luhualin <[email protected]>
Co-authored-by: Pavel Kositsyn <[email protected]>
Co-authored-by: Yuri Shkuro <[email protected]>
Co-authored-by: Ashmita <[email protected]>
Co-authored-by: Yuri Shkuro <[email protected]>
Co-authored-by: walker.wangxy <[email protected]>
  • Loading branch information
9 people authored and bhiravabhatla committed Jan 25, 2021
1 parent 602b943 commit 00b7daa
Show file tree
Hide file tree
Showing 5 changed files with 173 additions and 7 deletions.
54 changes: 54 additions & 0 deletions cmd/agent/app/reporter/connect_metrics.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
// Copyright (c) 2020 The Jaeger Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package reporter

import (
"github.com/uber/jaeger-lib/metrics"
"go.uber.org/zap"
)

type connectMetrics struct {
// used for reflect current connection stability
Reconnects metrics.Counter `metric:"collector_reconnects" help:"Number of successful connections (including reconnects) to the collector."`

// Connection status that jaeger-agent to jaeger-collector, 1 is connected, 0 is disconnected
Status metrics.Gauge `metric:"collector_connected" help:"Status of connection between the agent and the collector; 1 is connected, 0 is disconnected"`
}

// ConnectMetrics include connectMetrics necessary params if want to modify metrics of connectMetrics, must via ConnectMetrics API
type ConnectMetrics struct {
Logger *zap.Logger // required
MetricsFactory metrics.Factory // required
connectMetrics *connectMetrics
}

// NewConnectMetrics will be initialize ConnectMetrics
func (r *ConnectMetrics) NewConnectMetrics() {
r.connectMetrics = new(connectMetrics)
r.MetricsFactory = r.MetricsFactory.Namespace(metrics.NSOptions{Name: "connection_status"})
metrics.MustInit(r.connectMetrics, r.MetricsFactory, nil)
}

// OnConnectionStatusChange used for pass the status parameter when connection is changed
// 0 is disconnected, 1 is connected
// For quick view status via use `sum(jaeger_agent_connection_status_collector_connected{}) by (instance) > bool 0`
func (r *ConnectMetrics) OnConnectionStatusChange(connected bool) {
if connected {
r.connectMetrics.Status.Update(1)
r.connectMetrics.Reconnects.Inc(1)
} else {
r.connectMetrics.Status.Update(0)
}
}
81 changes: 81 additions & 0 deletions cmd/agent/app/reporter/connect_metrics_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
// Copyright (c) 2020 The Jaeger Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package reporter

import (
"testing"
"time"

"github.com/stretchr/testify/assert"
"github.com/uber/jaeger-lib/metrics/metricstest"
)

type connectMetricsTest struct {
mf *metricstest.Factory
}

func testConnectMetrics(fn func(tr *connectMetricsTest, r *ConnectMetrics)) {
testConnectMetricsWithParams(&ConnectMetrics{}, fn)
}

func testConnectMetricsWithParams(cm *ConnectMetrics, fn func(tr *connectMetricsTest, r *ConnectMetrics)) {
mf := metricstest.NewFactory(time.Hour)
cm.MetricsFactory = mf
cm.NewConnectMetrics()

tr := &connectMetricsTest{
mf: mf,
}

fn(tr, cm)
}

func testCollectorConnected(r *ConnectMetrics) {
r.OnConnectionStatusChange(true)
}

func testCollectorAborted(r *ConnectMetrics) {
r.OnConnectionStatusChange(false)
}

func TestConnectMetrics(t *testing.T) {

testConnectMetrics(func(tr *connectMetricsTest, r *ConnectMetrics) {
getGauge := func() map[string]int64 {
_, gauges := tr.mf.Snapshot()
return gauges
}

getCount := func() map[string]int64 {
counts, _ := tr.mf.Snapshot()
return counts
}

// testing connect aborted
testCollectorAborted(r)
assert.EqualValues(t, 0, getGauge()["connection_status.collector_connected"])

// testing connect connected
testCollectorConnected(r)
assert.EqualValues(t, 1, getGauge()["connection_status.collector_connected"])
assert.EqualValues(t, 1, getCount()["connection_status.collector_reconnects"])

// testing reconnect counts
testCollectorAborted(r)
testCollectorConnected(r)
assert.EqualValues(t, 2, getCount()["connection_status.collector_reconnects"])

})
}
39 changes: 35 additions & 4 deletions cmd/agent/app/reporter/grpc/builder.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,20 @@ package grpc
import (
"context"
"errors"
"expvar"
"fmt"
"strings"

grpc_retry "github.com/grpc-ecosystem/go-grpc-middleware/retry"
"github.com/grpc-ecosystem/go-grpc-middleware/retry"
"github.com/uber/jaeger-lib/metrics"
"go.uber.org/zap"
"google.golang.org/grpc"
"google.golang.org/grpc/connectivity"
"google.golang.org/grpc/credentials"
"google.golang.org/grpc/resolver"
"google.golang.org/grpc/resolver/manual"

"github.com/jaegertracing/jaeger/cmd/agent/app/reporter"
"github.com/jaegertracing/jaeger/pkg/config/tlscfg"
"github.com/jaegertracing/jaeger/pkg/discovery"
"github.com/jaegertracing/jaeger/pkg/discovery/grpcresolver"
Expand All @@ -43,6 +47,9 @@ type ConnBuilder struct {
DiscoveryMinPeers int
Notifier discovery.Notifier
Discoverer discovery.Discoverer

// for unit test and provide ConnectMetrics and outside call
ConnectMetrics *reporter.ConnectMetrics
}

// NewConnBuilder creates a new grpc connection builder.
Expand All @@ -51,7 +58,7 @@ func NewConnBuilder() *ConnBuilder {
}

// CreateConnection creates the gRPC connection
func (b *ConnBuilder) CreateConnection(logger *zap.Logger) (*grpc.ClientConn, error) {
func (b *ConnBuilder) CreateConnection(logger *zap.Logger, mFactory metrics.Factory) (*grpc.ClientConn, error) {
var dialOptions []grpc.DialOption
var dialTarget string
if b.TLS.Enabled { // user requested a secure connection
Expand Down Expand Up @@ -97,14 +104,38 @@ func (b *ConnBuilder) CreateConnection(logger *zap.Logger) (*grpc.ClientConn, er
return nil, err
}

go func(cc *grpc.ClientConn) {
if b.ConnectMetrics == nil {
cm := reporter.ConnectMetrics{
Logger: logger,
MetricsFactory: mFactory.Namespace(metrics.NSOptions{Name: "", Tags: map[string]string{"protocol": "grpc"}}),
}
cm.NewConnectMetrics()
b.ConnectMetrics = &cm
}

go func(cc *grpc.ClientConn, cm *reporter.ConnectMetrics) {
logger.Info("Checking connection to collector")
var egt *expvar.String
r := expvar.Get("gRPCTarget")
if r == nil {
egt = expvar.NewString("gRPCTarget")
} else {
egt = r.(*expvar.String)
}

for {
s := cc.GetState()
if s == connectivity.Ready {
cm.OnConnectionStatusChange(true)
egt.Set(cc.Target())
} else {
cm.OnConnectionStatusChange(false)
}

logger.Info("Agent collector connection state change", zap.String("dialTarget", dialTarget), zap.Stringer("status", s))
cc.WaitForStateChange(context.Background(), s)
}
}(conn)
}(conn, b.ConnectMetrics)

return conn, nil
}
4 changes: 2 additions & 2 deletions cmd/agent/app/reporter/grpc/builder_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ func TestBuilderFromConfig(t *testing.T) {
t,
[]string{"127.0.0.1:14268", "127.0.0.1:14269"},
cfg.CollectorHostPorts)
r, err := cfg.CreateConnection(zap.NewNop())
r, err := cfg.CreateConnection(zap.NewNop(), metrics.NullFactory)
require.NoError(t, err)
assert.NotNil(t, r)
}
Expand Down Expand Up @@ -149,7 +149,7 @@ func TestBuilderWithCollectors(t *testing.T) {
cfg.Notifier = test.notifier
cfg.Discoverer = test.discoverer

conn, err := cfg.CreateConnection(zap.NewNop())
conn, err := cfg.CreateConnection(zap.NewNop(), metrics.NullFactory)
if test.expectedError == "" {
require.NoError(t, err)
require.NotNil(t, conn)
Expand Down
2 changes: 1 addition & 1 deletion cmd/agent/app/reporter/grpc/collector_proxy.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ type ProxyBuilder struct {

// NewCollectorProxy creates ProxyBuilder
func NewCollectorProxy(builder *ConnBuilder, agentTags map[string]string, mFactory metrics.Factory, logger *zap.Logger) (*ProxyBuilder, error) {
conn, err := builder.CreateConnection(logger)
conn, err := builder.CreateConnection(logger, mFactory)
if err != nil {
return nil, err
}
Expand Down

0 comments on commit 00b7daa

Please sign in to comment.