-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[jaeger-v2] Define an internal interface of storage v2 spanstore #5399
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5399 +/- ##
===========================================
+ Coverage 55.80% 95.20% +39.40%
===========================================
Files 171 346 +175
Lines 9048 16908 +7860
===========================================
+ Hits 5049 16098 +11049
+ Misses 3527 610 -2917
+ Partials 472 200 -272
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: James Ryans <[email protected]>
Signed-off-by: James Ryans <[email protected]>
Signed-off-by: James Ryans <[email protected]>
Signed-off-by: James Ryans <[email protected]>
Signed-off-by: James Ryans <[email protected]>
b38903e
to
36c235a
Compare
storage_v2/spanstore/writer.go
Outdated
// WriteTrace writes batches of spans at once and | ||
// compatible with OTLP Exporter API. | ||
WriteTraces(ctx context.Context, td ptrace.Traces) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// WriteTrace writes batches of spans at once and | |
// compatible with OTLP Exporter API. | |
WriteTraces(ctx context.Context, td ptrace.Traces) error | |
// WriteTrace writes a batch of spans to storage. Idempotent. | |
// Implementations are not required to support atomic transactions, | |
// so if any of the spans fail to be written an error is returned. | |
// Compatible with OTLP Exporter API. | |
WriteTraces(ctx context.Context, td ptrace.Traces) error |
storage_v2/spanstore/reader.go
Outdated
// GetTrace retrieves the trace with a given id. | ||
// | ||
// If no spans are stored for this trace, it returns ErrTraceNotFound. | ||
GetTrace(ctx context.Context, traceID model.TraceID) (*model.Trace, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the return value should be in ptrace model, similar to Writer
GetTrace(ctx context.Context, traceID model.TraceID) (*model.Trace, error)
Isn't there a TraceID type in ptrace
? I would use that - we want to do a clean break from legacy /model/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a TraceID type in ptrace?
Yes, they have TraceID type defined in the pcommon pkg.
we want to do a clean break from legacy
/model/
I see, I'll keep that in mind!
storage_v2/spanstore/reader.go
Outdated
// by different spans. | ||
// | ||
// If no matching traces are found, the function returns (nil, nil). | ||
FindTraces(ctx context.Context, query *TraceQueryParameters) ([]*model.Trace, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, this one is tricky. We want the result to be ptrace
model, but in ptrace
there is not clear separation of spans from different traces, instead the separation is by resource/scope, which are not very relevant on retrieval (usually by then they are de-normalized onto individual spans in storage, although some storage backends can theoretically store fully normalized model). Plus, in order to reproduce the grouping of spans into resource/scope hierarchy we need to do relatively expensive comparison (fwiw the query service does this today for Process deduping, even though the UI does not make use of such normalization).
I believe in my RFC I proposed returning []*ptrace.TraceData
from these. The deduping could be an optional step/enrichment in the query-service, rather than forcing each storage backend do that.
Signed-off-by: James Ryans <[email protected]>
Signed-off-by: James Ryans <[email protected]>
Signed-off-by: James Ryans <[email protected]>
Signed-off-by: James Ryans <[email protected]>
Signed-off-by: James Ryans <[email protected]>
Signed-off-by: James Ryans <[email protected]>
Signed-off-by: James Ryans <[email protected]>
storage_v2/spanstore/factory.go
Outdated
type Factory interface { | ||
// Initialize performs internal initialization of the factory, such as opening connections to the backend store. | ||
// It is called after all configuration of the factory itself has been done. | ||
Initialize() error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend comparing this with OTEL collector component lifecycle methods. Eg would you need to accept telemetry objects here, or would it be passed to the constructor function? Do we need Close if we're adding lifecycle methods?
Also, different storage factories (eg dependencies storage) would need the same lifestyle interface so it should be pulled to the higher pkg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will pass the telemetry objects to the constructor and only receive the logger and config, which is similar to the existing factory, e.g. grpc.NewFactory(*zap.Logger, grpc.Config)
. I've looked into other OTEL collector extension implementations and they (mostly) only accept logger from the telemetry object.
I've added the flowchart of OTEL collector component's lifecycle below. Inside the Component.Start(context.Context, component.Host)
, we will call spanstore.NewFactory(*zap.Logger, component.Config)
and then Factory.Initialize(context.Context)
. Once the component is shutting down through Component.Shutdown(context.Context)
, we will call Factory.Close(context.Context)
.
flowchart TD
extension.Factory --> Factory.CreateExtension
Factory.CreateExtension -->|returns| component.Component
component.Component --> Component.Start
Component.Start --> spanstore.NewFactory
spanstore.NewFactory -->|returns| spanstore.Factory
spanstore.Factory --> Factory.Initialize
Factory.Initialize --> Running
Running --> Factory.Close
Factory.Close -->|called by| Component.Shutdown
subgraph "OTEL Collector"
extension.Factory
Factory.CreateExtension
component.Component
Component.Start
Component.Shutdown
subgraph "Jaeger Storage Extension"
spanstore.NewFactory
spanstore.Factory
Factory.Initialize
Running
Factory.Close
end
end
Signed-off-by: James Ryans <[email protected]>
Signed-off-by: James Ryans <[email protected]>
Signed-off-by: James Ryans <[email protected]>
storage_v2/factory.go
Outdated
// It lives within the OTEL collector extension component's lifecycle. | ||
// The Initialize and Close functions supposed to be called from the | ||
// OTEL component's Start and Shutdown functions. | ||
type Factory interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type Factory interface { | |
type FactoryBase interface { |
Even though it is namespaced separately from the actual factories, this interface itself is not a "Factory"
storage_v2/spanstore/writer.go
Outdated
// Implementations are not required to support atomic transactions, | ||
// so if any of the spans fail to be written an error is returned. | ||
// Compatible with OTLP Exporter API. | ||
WriteTraces(ctx context.Context, td *ptrace.Traces) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTEL Exporter interface passes ptrace by Value, let's do the same:
func (exp *storageExporter) pushTraces(ctx context.Context, td ptrace.Traces) error {
storage_v2/spanstore/reader.go
Outdated
// by different spans. | ||
// | ||
// If no matching traces are found, the function returns (nil, nil). | ||
FindTraces(ctx context.Context, query *TraceQueryParameters) ([]*ptrace.Traces, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FindTraces(ctx context.Context, query *TraceQueryParameters) ([]*ptrace.Traces, error) | |
FindTraces(ctx context.Context, query *TraceQueryParameters) ([]ptrace.Traces, error) |
if you return 30 traces, []*...
requires 30 heap allocations for no practical gains.
storage_v2/spanstore/reader.go
Outdated
// GetTrace retrieves the trace with a given id. | ||
// | ||
// If no spans are stored for this trace, it returns ErrTraceNotFound. | ||
GetTrace(ctx context.Context, traceID pcommon.TraceID) (*ptrace.Traces, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetTrace(ctx context.Context, traceID pcommon.TraceID) (*ptrace.Traces, error) | |
GetTrace(ctx context.Context, traceID pcommon.TraceID) (ptrace.Traces, error) |
type Factory interface { | ||
// Initialize performs internal initialization of the factory, | ||
// such as opening connections to the backend store. | ||
Initialize(ctx context.Context) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've read through the discussions and the latest approach suggested that the Start
function shouldn't throw error anymore and uses the TelemetrySettings.ReportStatus
to report StatusFatalError
to halt the startup asynchronously, cmiiw.
I'm thinking that any changes to the component lifecycles should be handled by the Jaeger storage extension, we can capture the errors from the storage factory in the extension and report them instead, so the storage factory interface should not be affected by it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, this will be similar to what jaeger-v1 does today. However, jaeger-v1 has a failure mode today where if the db is not ready for any reason when the binary starts (even though the Jaeger configuration pointing to the db is correct), Jaeger fails to start and the binary exists. That's not how a component of a distributed system technically should be operating, i.e. a db temporarily not available at start-up vs. becoming temporarily not available during normal operation should not be treated as different conditions. We don't exit the binary if we lose a connection later, so why do we kill it if we cannot establish the connection during Init()?
If we were to fix this, we need to establish a different expectation from storage implementations, and this is where async runtime status reporting instead of returning an error from Init is coming from. This open-telemetry/opentelemetry-collector#9324 (comment) has a good explanation.
Signed-off-by: James Ryans <[email protected]>
Signed-off-by: James Ryans <[email protected]>
…gertracing#5399) ## Which problem is this PR solving? - Part of jaegertracing#5334 ## Description of the changes - An API design of Storage V2 spanstore and its proto file. - For the detailed discussion and how we arrived to this design, please take a look at https://docs.google.com/document/d/1IvUcDsdRxMPK-xTUE32w3NSAGTkUcmnDQttN6ijUnWs/edit?usp=sharing ## How was this change tested? - This PR hasn't been tested yet since it only contains interfaces and no actual implementation to be tested. ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [ ] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `yarn lint` and `yarn test` --------- Signed-off-by: James Ryans <[email protected]> Signed-off-by: Vamshi Maskuri <[email protected]>
…gertracing#5399) ## Which problem is this PR solving? - Part of jaegertracing#5334 ## Description of the changes - An API design of Storage V2 spanstore and its proto file. - For the detailed discussion and how we arrived to this design, please take a look at https://docs.google.com/document/d/1IvUcDsdRxMPK-xTUE32w3NSAGTkUcmnDQttN6ijUnWs/edit?usp=sharing ## How was this change tested? - This PR hasn't been tested yet since it only contains interfaces and no actual implementation to be tested. ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [ ] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `yarn lint` and `yarn test` --------- Signed-off-by: James Ryans <[email protected]>
Which problem is this PR solving?
Description of the changes
How was this change tested?
Checklist
jaeger
:make lint test
jaeger-ui
:yarn lint
andyarn test