Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent (otel) can't be stopped when traces are in flight #2576

Closed
jpkrohling opened this issue Oct 19, 2020 · 2 comments
Closed

Agent (otel) can't be stopped when traces are in flight #2576

jpkrohling opened this issue Oct 19, 2020 · 2 comments

Comments

@jpkrohling
Copy link
Contributor

Describe the bug
When the agent can't send a trace to the collector, the shutdown procedure gets blocked indefinitely.
Probably related to open-telemetry/opentelemetry-collector#1192.

To Reproduce
Steps to reproduce the behavior:

  1. start a collector
  2. start the agent like from Add auth support for collector #2570 (with an invalid token)
  3. send a trace
  4. ensure that the collector didn't receive the data
  5. shutdown the agent

Expected behavior

The following is shown, blocking in the last line indefinitely:

2020-10-19T11:32:07.845+0200	INFO	service/service.go:432	Starting shutdown...
2020-10-19T11:32:07.845+0200	INFO	healthcheck/handler.go:128	Health Check state change	{"component_kind": "extension", "component_type": "health_check", "component_name": "health_check", "status": "unavailable"}
2020-10-19T11:32:07.845+0200	INFO	service/service.go:365	Stopping receivers...
2020-10-19T11:32:07.845+0200	ERROR	jaegerreceiver/trace_receiver.go:373	http server failure	{"component_kind": "receiver", "component_type": "jaeger", "component_name": "jaeger", "error": "http: Server closed"}
go.opentelemetry.io/collector/receiver/jaegerreceiver.(*jReceiver).startAgent.func1
	/home/jpkroehling/go/pkg/mod/go.opentelemetry.io/[email protected]/receiver/jaegerreceiver/trace_receiver.go:373
2020-10-19T11:32:07.845+0200	INFO	service/service.go:371	Stopping processors...
2020-10-19T11:32:07.845+0200	INFO	builder/pipelines_builder.go:69	Pipeline is shutting down...	{"pipeline_name": "", "pipeline_datatype": "traces"}

To finish it, sending a SIGKILL to the process is required.

@jpkrohling jpkrohling changed the title Agent can't be stopped when traces are in flight Agent (otel) can't be stopped when traces are in flight Oct 19, 2020
@jpkrohling
Copy link
Contributor Author

It eventually does shut down, after 5m:

2020-10-19T11:32:07.845+0200	INFO	service/service.go:432	Starting shutdown...
2020-10-19T11:32:07.845+0200	INFO	healthcheck/handler.go:128	Health Check state change	{"component_kind": "extension", "component_type": "health_check", "component_name": "health_check", "status": "unavailable"}
2020-10-19T11:32:07.845+0200	INFO	service/service.go:365	Stopping receivers...
2020-10-19T11:32:07.845+0200	ERROR	jaegerreceiver/trace_receiver.go:373	http server failure	{"component_kind": "receiver", "component_type": "jaeger", "component_name": "jaeger", "error": "http: Server closed"}
go.opentelemetry.io/collector/receiver/jaegerreceiver.(*jReceiver).startAgent.func1
	/home/jpkroehling/go/pkg/mod/go.opentelemetry.io/[email protected]/receiver/jaegerreceiver/trace_receiver.go:373
2020-10-19T11:32:07.845+0200	INFO	service/service.go:371	Stopping processors...
2020-10-19T11:32:07.845+0200	INFO	builder/pipelines_builder.go:69	Pipeline is shutting down...	{"pipeline_name": "", "pipeline_datatype": "traces"}
^Cma^C2020-10-19T11:37:24.148+0200	WARN	batchprocessor/batch_processor.go:166	Sender failed	{"component_kind": "processor", "component_type": "batch", "component_name": "batch", "error": "max elapsed time expired failed to push trace data via Jaeger exporter: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp [::1]:14251: connect: connection refused\""}
go.opentelemetry.io/collector/processor/batchprocessor.(*batchProcessor).sendItems
	/home/jpkroehling/go/pkg/mod/go.opentelemetry.io/[email protected]/processor/batchprocessor/batch_processor.go:166
go.opentelemetry.io/collector/processor/batchprocessor.(*batchProcessor).startProcessingCycle
	/home/jpkroehling/go/pkg/mod/go.opentelemetry.io/[email protected]/processor/batchprocessor/batch_processor.go:145
2020-10-19T11:37:24.149+0200	INFO	builder/pipelines_builder.go:75	Pipeline is shutdown.	{"pipeline_name": "", "pipeline_datatype": "traces"}
2020-10-19T11:37:24.149+0200	INFO	service/service.go:377	Stopping exporters...
2020-10-19T11:37:24.149+0200	INFO	service/service.go:387	Stopping extensions...
2020-10-19T11:37:24.149+0200	INFO	service/service.go:454	Shutdown complete.

@jpkrohling
Copy link
Contributor Author

Example of a shutdown with an empty in-flight queue:

2020-10-19T11:40:21.460+0200	INFO	service/service.go:432	Starting shutdown...
2020-10-19T11:40:21.460+0200	INFO	healthcheck/handler.go:128	Health Check state change	{"component_kind": "extension", "component_type": "health_check", "component_name": "health_check", "status": "unavailable"}
2020-10-19T11:40:21.460+0200	INFO	service/service.go:365	Stopping receivers...
2020-10-19T11:40:21.460+0200	ERROR	jaegerreceiver/trace_receiver.go:373	http server failure	{"component_kind": "receiver", "component_type": "jaeger", "component_name": "jaeger", "error": "http: Server closed"}
go.opentelemetry.io/collector/receiver/jaegerreceiver.(*jReceiver).startAgent.func1
	/home/jpkroehling/go/pkg/mod/go.opentelemetry.io/[email protected]/receiver/jaegerreceiver/trace_receiver.go:373
2020-10-19T11:40:21.460+0200	INFO	service/service.go:371	Stopping processors...
2020-10-19T11:40:21.461+0200	INFO	builder/pipelines_builder.go:69	Pipeline is shutting down...	{"pipeline_name": "", "pipeline_datatype": "traces"}
2020-10-19T11:40:21.461+0200	INFO	builder/pipelines_builder.go:75	Pipeline is shutdown.	{"pipeline_name": "", "pipeline_datatype": "traces"}
2020-10-19T11:40:21.461+0200	INFO	service/service.go:377	Stopping exporters...
2020-10-19T11:40:21.461+0200	INFO	service/service.go:387	Stopping extensions...
2020-10-19T11:40:21.461+0200	INFO	service/service.go:454	Shutdown complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants