Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jaeger-v2] Add support for Elasticsearch #5152

Merged
merged 34 commits into from
Feb 27, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
0afdfd5
add elasticsearch for jeager-v2
akagami-harsh Jan 29, 2024
c60bf5a
fix
akagami-harsh Jan 29, 2024
67707ed
Merge branch 'main' into elasticsearch
akagami-harsh Jan 29, 2024
a47bb35
add tests
akagami-harsh Jan 29, 2024
5ea1df1
Merge branch 'main' into elasticsearch
akagami-harsh Jan 31, 2024
70edeeb
Merge branch 'main' into elasticsearch
akagami-harsh Feb 5, 2024
430a0ce
use separate index_prefix in archive storage
akagami-harsh Feb 5, 2024
d8d4939
Update cmd/jaeger/internal/extension/jaegerstorage/extension.go
akagami-harsh Feb 6, 2024
da38c6a
Update cmd/jaeger/internal/extension/jaegerstorage/extension.go
akagami-harsh Feb 6, 2024
001e4a0
fix
akagami-harsh Feb 6, 2024
f93b1c6
solve merge conflicts
akagami-harsh Feb 7, 2024
9aa2c26
refactor tests
akagami-harsh Feb 7, 2024
c5c20ce
fix
akagami-harsh Feb 7, 2024
bcdf8a0
Merge branch 'main' into elasticsearch
akagami-harsh Feb 7, 2024
236067c
add annotations
akagami-harsh Feb 16, 2024
72c5c42
fix
akagami-harsh Feb 16, 2024
79130b4
remove left over yaml annotations
akagami-harsh Feb 18, 2024
e373c3d
add docker compose for debug
akagami-harsh Feb 18, 2024
e646a3f
fix
akagami-harsh Feb 18, 2024
e61cf78
revert changes
akagami-harsh Feb 19, 2024
6824ec4
fix
akagami-harsh Feb 19, 2024
999d336
Merge branch 'jaegertracing:main' into elasticsearch
akagami-harsh Feb 19, 2024
66f63b5
rename es_config.yaml to config-elasticsearch.yaml
akagami-harsh Feb 19, 2024
a86760c
set UseReadWriteAliases to true in config file
akagami-harsh Feb 20, 2024
7ad3b0d
remove debug docker compose and dockerfile
akagami-harsh Feb 20, 2024
8dc35cb
add validator to validate config
akagami-harsh Feb 20, 2024
0744562
fix
akagami-harsh Feb 20, 2024
275786c
fix
akagami-harsh Feb 20, 2024
3763aa9
fix
akagami-harsh Feb 25, 2024
495daaf
fix lint
akagami-harsh Feb 25, 2024
0f7d2f1
add test for cfg validation
akagami-harsh Feb 26, 2024
a6cec26
fix lint
akagami-harsh Feb 26, 2024
915e83f
fix
akagami-harsh Feb 26, 2024
278d49a
Merge branch 'main' into elasticsearch
yurishkuro Feb 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions cmd/jaeger/Dockerfile.debug
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
FROM alpine:latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why you need this one - how is it different from the main Dockerfile?

Copy link
Member Author

@akagami-harsh akagami-harsh Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having issues with the main Dockerfile so i created a simpler one as a solution. now we don't need the Dockerfile and Docker-compose file anymore, so I'm removing them.


WORKDIR /app

COPY jaeger-linux-amd64 /app/

RUN chmod +x /app/jaeger-linux-amd64

EXPOSE 5775/udp 6831/udp 6832/udp 5778 16686 14268 14250 9200

ENTRYPOINT ["/app/jaeger-linux-amd64"]
45 changes: 45 additions & 0 deletions cmd/jaeger/docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
version: '3.7'

services:
elasticsearch:
image: elasticsearch:8.12.0
container_name: elasticsearch
environment:
- discovery.type=single-node
- xpack.security.enabled=false
ports:
- "9200:9200"
healthcheck:
test: ["CMD-SHELL", "curl -s http://localhost:9200/_cluster/health | grep -vq '\"status\":\"red\"'"]
interval: 10s
timeout: 5s
retries: 5
networks:
- esnet

jaeger:
build:
context: ./
dockerfile: Dockerfile.debug
container_name: jaeger
ports:
- "16686:16686"
- "6831:6831/udp"
- "6832:6832/udp"
- "5778:5778"
- "14268:14268"
- "14250:14250"
depends_on:
elasticsearch:
condition: service_healthy
networks:
- esnet
volumes:
- ./es_config.yaml:/app/es_config.yaml
- ./config-ui.json:/app/cmd/jaeger/config-ui.json
command: ["--config", "/app/es_config.yaml"]
restart: always

networks:
esnet:
driver: bridge
37 changes: 37 additions & 0 deletions cmd/jaeger/es_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
service:
akagami-harsh marked this conversation as resolved.
Show resolved Hide resolved
extensions: [jaeger_storage, jaeger_query]
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [jaeger_storage_exporter]

extensions:
jaeger_query:
trace_storage: es_main
trace_storage_archive: es_archive
ui_config: ./cmd/jaeger/config-ui.json

jaeger_storage:
elasticsearch:
es_main:
server_urls: http://elasticsearch:9200
log_level: "error"
num_shards: 5
es_archive:
yurishkuro marked this conversation as resolved.
Show resolved Hide resolved
server_urls: http://elasticsearch:9200
log_level: "error"
index_prefix: "jaeger-archive"
num_shards: 5
receivers:
otlp:
protocols:
grpc:
http:

processors:
batch:

exporters:
jaeger_storage_exporter:
trace_storage: es_main
6 changes: 4 additions & 2 deletions cmd/jaeger/internal/extension/jaegerstorage/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,16 @@ import (
"fmt"
"reflect"

esCfg "github.com/jaegertracing/jaeger/pkg/es/config"
memoryCfg "github.com/jaegertracing/jaeger/pkg/memory/config"
badgerCfg "github.com/jaegertracing/jaeger/plugin/storage/badger"
)

// Config has the configuration for jaeger-query,
type Config struct {
Memory map[string]memoryCfg.Configuration `mapstructure:"memory"`
Badger map[string]badgerCfg.NamespaceConfig `mapstructure:"badger"`
Memory map[string]memoryCfg.Configuration `mapstructure:"memory"`
Badger map[string]badgerCfg.NamespaceConfig `mapstructure:"badger"`
Elasticsearch map[string]esCfg.Configuration `mapstructure:"elasticsearch"`
// TODO add other storage types here
// TODO how will this work with 3rd party storage implementations?
// Option: instead of looking for specific name, check interface.
Expand Down
9 changes: 9 additions & 0 deletions cmd/jaeger/internal/extension/jaegerstorage/extension.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,12 @@ import (
"go.opentelemetry.io/collector/extension"
"go.uber.org/zap"

esCfg "github.com/jaegertracing/jaeger/pkg/es/config"
memoryCfg "github.com/jaegertracing/jaeger/pkg/memory/config"
"github.com/jaegertracing/jaeger/pkg/metrics"
"github.com/jaegertracing/jaeger/plugin/storage/badger"
badgerCfg "github.com/jaegertracing/jaeger/plugin/storage/badger"
"github.com/jaegertracing/jaeger/plugin/storage/es"
"github.com/jaegertracing/jaeger/plugin/storage/memory"
"github.com/jaegertracing/jaeger/storage"
)
Expand Down Expand Up @@ -107,10 +109,17 @@ func (s *storageExt) Start(ctx context.Context, host component.Host) error {
cfg: s.config.Badger,
builder: badger.NewFactoryWithConfig,
}
esStarter := &starter[esCfg.Configuration, *es.Factory]{
ext: s,
storageKind: "elasticsearch",
cfg: s.config.Elasticsearch,
builder: es.NewFactoryWithConfig,
}

builders := []func(ctx context.Context, host component.Host) error{
memStarter.build,
badgerStarter.build,
esStarter.build,
// TODO add support for other backends
}
for _, builder := range builders {
Expand Down
43 changes: 43 additions & 0 deletions cmd/jaeger/internal/extension/jaegerstorage/extension_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ package jaegerstorage
import (
"context"
"fmt"
"net/http"
"net/http/httptest"
"testing"

"github.com/stretchr/testify/require"
Expand All @@ -16,6 +18,7 @@ import (
nooptrace "go.opentelemetry.io/otel/trace/noop"
"go.uber.org/zap"

esCfg "github.com/jaegertracing/jaeger/pkg/es/config"
memoryCfg "github.com/jaegertracing/jaeger/pkg/memory/config"
"github.com/jaegertracing/jaeger/pkg/metrics"
badgerCfg "github.com/jaegertracing/jaeger/plugin/storage/badger"
Expand Down Expand Up @@ -151,6 +154,46 @@ func TestBadgerStorageExtensionError(t *testing.T) {
require.ErrorContains(t, err, "/bad/path")
}

func TestESStorageExtension(t *testing.T) {
mockEsServerResponse := []byte(`
{
"Version": {
"Number": "6"
}
}
`)
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Write(mockEsServerResponse)
}))
defer server.Close()
storageExtension := makeStorageExtenion(t, &Config{
Elasticsearch: map[string]esCfg.Configuration{
"foo": {
Servers: []string{server.URL},
LogLevel: "error",
},
},
})
ctx := context.Background()
err := storageExtension.Start(ctx, componenttest.NewNopHost())
require.NoError(t, err)
require.NoError(t, storageExtension.Shutdown(ctx))
}

func TestESStorageExtensionError(t *testing.T) {
ext := makeStorageExtenion(t, &Config{
Elasticsearch: map[string]esCfg.Configuration{
"foo": {
Servers: []string{"badurl"},
LogLevel: "error",
},
},
})
err := ext.Start(context.Background(), componenttest.NewNopHost())
require.ErrorContains(t, err, "failed to initialize elasticsearch storage")
require.ErrorContains(t, err, "no Elasticsearch node available")
}

func noopTelemetrySettings() component.TelemetrySettings {
return component.TelemetrySettings{
Logger: zap.L(),
Expand Down
16 changes: 8 additions & 8 deletions pkg/es/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ import (

// Configuration describes the configuration properties needed to connect to an ElasticSearch cluster
type Configuration struct {
Servers []string `mapstructure:"server_urls"`
Servers []string `mapstructure:"server_urls" validate:"required,min=1,dive,url"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a test we could write for ensuring that these annotations are correct (btw, what is "dive"?)

We could do it brute-force and just write individual tests for individual fields, but it seems redundant as that would be already done in the validator package (hopefully). But at the same time, having no tests to even ensure that the syntax of the annotations is correct is troublesome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment for L58 and below - can you try removing yaml:... annotations? I think they are left-overs from earlier attempts to build on OTEL Coll, I don't think we're using them anywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment for L58 and below - can you try removing yaml:... annotations? I think they are left-overs from earlier attempts to build on OTEL Coll, I don't think we're using them anywhere.

removed it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(btw, what is "dive"?)

dive instructs the validation library to examine each element within the slice individually

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a test we could write for ensuring that these annotations are correct

We could do it brute-force and just write individual tests for individual fields, but it seems redundant as that would be already done in the validator package (hopefully). But at the same time, having no tests to even ensure that the syntax of the annotations is correct is troublesome.

we can write unit tests that check whether the validation behaves as expected. Instead of testing individual fields, we can create a set of test cases that cover various scenarios, including both valid and invalid configurations.
should i create one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that wouldn't be what I am after, such test would simply verify that the Validator is working as expected, on select fields. But if I declare a field with annotations that don't make sense, I want to catch that in unit tests. E.g. misspell dive as diev (btw weird option, "recursive" would've been much more self explanatory).

Copy link
Member Author

@akagami-harsh akagami-harsh Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the "github.com/go-playground/validator" library in our unit tests to detect misspelled annotations and fields with invalid annotation combinations.

like this

package main

import (
	"github.com/go-playground/validator"
)

type Webhook struct {
	CallbackURL string `validate:"url,badAnnotation"` // badAnnotation is not a valid tag
}

func main() {
	webhook := Webhook{
		CallbackURL: "https://www.example.com/",
	}
	validate := validator.New()
	validate.Struct(webhook)

}
>output 
panic: Undefined validation function 'badAnnotation' on field 'CallbackURL'

goroutine 1 [running]:
github.com/go-playground/validator.(*Validate).parseFieldTagsRecursive(0xc00002e240, {0x526987?, 0x55812d?}, {0x526971, 0xb}, {0x0, 0x0}, 0x0)
        /home/harsh/go/pkg/mod/github.com/go-playground/[email protected]+incompatible/cache.go:289 +0x9d9
github.com/go-playground/validator.(*Validate).extractStructCache(0xc00002e240, {0x538520?, 0xc00020a950?, 0x7fda1aac3198?}, {0x52160c, 0x7})
        /home/harsh/go/pkg/mod/github.com/go-playground/[email protected]+incompatible/cache.go:150 +0x4ec
github.com/go-playground/validator.(*validate).validateStruct(0xc000118480, {0x58b0c0, 0x68faa0}, {0x538520?, 0xc00020a950?, 0xc000242000?}, {0x538520?, 0xc00020a950?, 0x40e3a5?}, {0x58cd60, ...}, ...)
        /home/harsh/go/pkg/mod/github.com/go-playground/[email protected]+incompatible/validator.go:37 +0x196
github.com/go-playground/validator.(*Validate).StructCtx(0xc00002e240, {0x58b0c0, 0x68faa0}, {0x538520?, 0xc00020a950?})
        /home/harsh/go/pkg/mod/github.com/go-playground/[email protected]+incompatible/validator_instance.go:304 +0x491
github.com/go-playground/validator.(*Validate).Struct(...)
        /home/harsh/go/pkg/mod/github.com/go-playground/[email protected]+incompatible/validator_instance.go:277
main.main()
        /home/harsh/Testing/main.go:16 +0x65
exit status 2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about "github.com/asaskevich/govalidator", will it do the same? We do not use "github.com/go-playground/validator"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it do the same?

Nope, it doesn't do the same. I actually tried using the existing validator at first.

RemoteReadClusters []string `mapstructure:"remote_read_clusters"`
Username string `mapstructure:"username"`
Password string `mapstructure:"password" json:"-"`
Expand All @@ -54,13 +54,13 @@ type Configuration struct {
AllowTokenFromContext bool `mapstructure:"-"`
Sniffer bool `mapstructure:"sniffer"` // https://github.com/olivere/elastic/wiki/Sniffing
SnifferTLSEnabled bool `mapstructure:"sniffer_tls_enabled"`
MaxDocCount int `mapstructure:"-"` // Defines maximum number of results to fetch from storage per query
MaxSpanAge time.Duration `yaml:"max_span_age" mapstructure:"-"` // configures the maximum lookback on span reads
akagami-harsh marked this conversation as resolved.
Show resolved Hide resolved
NumShards int64 `yaml:"shards" mapstructure:"num_shards"`
NumReplicas int64 `yaml:"replicas" mapstructure:"num_replicas"`
PrioritySpanTemplate int64 `yaml:"priority_span_template" mapstructure:"priority_span_template"`
PriorityServiceTemplate int64 `yaml:"priority_service_template" mapstructure:"priority_service_template"`
PriorityDependenciesTemplate int64 `yaml:"priority_dependencies_template" mapstructure:"priority_dependencies_template"`
MaxDocCount int `mapstructure:"-"`
MaxSpanAge time.Duration `mapstructure:"-"`
NumShards int64 `mapstructure:"num_shards"`
NumReplicas int64 `mapstructure:"num_replicas"`
PrioritySpanTemplate int64 `mapstructure:"priority_span_template"`
PriorityServiceTemplate int64 `mapstructure:"priority_service_template"`
PriorityDependenciesTemplate int64 `mapstructure:"priority_dependencies_template"`
Timeout time.Duration `validate:"min=500" mapstructure:"-"`
BulkSize int `mapstructure:"-"`
BulkWorkers int `mapstructure:"-"`
Expand Down
18 changes: 18 additions & 0 deletions plugin/storage/es/factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,24 @@ func NewFactory() *Factory {
}
}

func NewFactoryWithConfig(
cfg config.Configuration,
metricsFactory metrics.Factory,
logger *zap.Logger,
) (*Factory, error) {
f := NewFactory()
cfg.MaxDocCount = defaultMaxDocCount
f.InitFromOptions(Options{
Primary: namespaceConfig{Configuration: cfg},
others: make(map[string]*namespaceConfig),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, if only Primary and not Archive is ever initialized, would it actually provide an archive storage to the UI? Did you try it out?

Copy link
Member Author

@akagami-harsh akagami-harsh Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right it dose not work with ui. it throw's this error

docker container used to test it

docker run --name es01 -p 9200:9200 -e "discovery.type=single-node" -e "xpack.security.enabled=false" elasticsearch:8.12.0

image
do you know what causes it?🤔 , i tried fixing it with different configuration of Options but couldn't fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to look in the server logs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put together a docker-compose that illustrates this issue?

This "all shards failed" error has always annoyed me, I wonder if there are more details in the error returned from ES that we're not logging that could be more descriptive. Like, it's ridiculous for a database to respond with "your query failed" instead of telling why (the fact that it failed on "all shards" is not an explanation of why).

Copy link
Member Author

@akagami-harsh akagami-harsh Feb 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put together a docker-compose that illustrates this issue?

sure, i'll make one.

Copy link
Member Author

@akagami-harsh akagami-harsh Feb 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check this one previously because there were no errors in the UI. However, I've just ran it with elasticsearch and found this issue.

harsh@MSI:~$ curl localhost:16686

  function getJaegerStorageCapabilities() {
        const DEFAULT_STORAGE_CAPABILITIES = { "archiveStorage": false };
        const JAEGER_STORAGE_CAPABILITIES = {"archiveStorage":false};
        return JAEGER_STORAGE_CAPABILITIES;
      }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when i checked with badger and memstore strangely {"archiveStorage":false} in both of them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for memstore you need to add archive to the config explicitly:

diff --git a/cmd/jaeger/internal/all-in-one.yaml b/cmd/jaeger/internal/all-in-one.yaml
index d17c52fc..a6189c8e 100644
--- a/cmd/jaeger/internal/all-in-one.yaml
+++ b/cmd/jaeger/internal/all-in-one.yaml
@@ -9,11 +9,14 @@ service:
 extensions:
   jaeger_query:
     trace_storage: memstore
+    trace_storage_archive: memstore2

   jaeger_storage:
     memory:
       memstore:
         max_traces: 100000
+      memstore2:
+        max_traces: 100000

But maybe this is ok - the idea in v2 was to decouple primary and archive storage altogether, so that one could use different types of backends for those.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in theory the config you have should've worked, because you did define archive storage, but something isn't working.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a small fix for above problem, which sets {"archiveEnabled":true}

})
err := f.Initialize(metricsFactory, logger)
if err != nil {
return nil, err
}
return f, nil
}

// AddFlags implements plugin.Configurable
func (f *Factory) AddFlags(flagSet *flag.FlagSet) {
f.Options.AddFlags(flagSet)
Expand Down
20 changes: 20 additions & 0 deletions plugin/storage/es/factory_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,26 @@ func TestInitFromOptions(t *testing.T) {
assert.Equal(t, o.Get(archiveNamespace), f.archiveConfig)
}

func TestESStorageFactoryWithConfig(t *testing.T) {
cfg := escfg.Configuration{}
_, err := NewFactoryWithConfig(cfg, metrics.NullFactory, zap.NewNop())
require.Error(t, err)
require.ErrorContains(t, err, "failed to create primary Elasticsearch client")

server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Write(mockEsServerResponse)
}))
defer server.Close()

cfg = escfg.Configuration{
Servers: []string{server.URL},
LogLevel: "error",
}
factory, err := NewFactoryWithConfig(cfg, metrics.NullFactory, zap.NewNop())
require.NoError(t, err)
defer factory.Close()
}

func TestPasswordFromFile(t *testing.T) {
defer testutils.VerifyGoLeaksOnce(t)
t.Run("primary client", func(t *testing.T) {
Expand Down
Loading