Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Shut down zombie goroutine in chronicleexporter #2029

Merged
merged 3 commits into from
Dec 9, 2024

Conversation

mrsillydog
Copy link
Contributor

Proposed Change

While testing another fix by spinning up a chronicle exporter, sending a log to Chronicle, and then shutting down the exporter, I encountered this error:

image

I was understandably confused, so asked for help from Dan Jaglowski, and he figured it out and came up with this solution after hours, along with some other refactors. Explanation and fix are all his, I'm just opening the PR.

To fix this, the Shutdown function actually needs to be:

func (ce *chronicleExporter) Shutdown(context.Context) error {
	if ce.cfg.Protocol == protocolHTTPS {
		t := ce.httpClient.Transport.(*oauth2.Transport)
		if t.Base != nil {
			t.Base.(*http.Transport).CloseIdleConnections()
		} else {
			http.DefaultTransport.(*http.Transport).CloseIdleConnections()
		}
		return nil
	}

	ce.cancel()
	ce.wg.Wait()
	if ce.grpcConn != nil {
		if err := ce.grpcConn.Close(); err != nil {
			return fmt.Errorf("connection close: %s", err)
		}
	}
	return nil
}

Larger explanation:

In Start, when we instantiate the httpClient : ce.httpClient = oauth2.NewClient(context.Background(), creds.TokenSource) , it doesn't actually matter what context we pass in. It isn't used for cancelation.

What we get back is always an *http.Client that contains a Transport field of type *oauth2.Transport. This in turn always contains a Base that is nil, which means it will use http.DefaultTransport. The thing with http.DefaultTransport (as well as many others) is that they will reuse connections by setting them into a "keep alive" state. The only way to clean these up is to call CloseIdleConnections() on the Transport. However, because we're getting an *oauth2.Transport that doesn't itself contain a CloseIdleConnections() method, we have to to access the http.DefaultTransport directly and call CloseIdleConnections() on it.

If you would like a test case to reproduce this, please contact me for one - not sure we have one that doesn't involve actually sending data to Chronicle.

Checklist
  • Changes are tested
  • CI has passed

@mrsillydog mrsillydog requested review from dpaasman00 and a team as code owners December 5, 2024 14:40
@tbm48813
Copy link

tbm48813 commented Dec 5, 2024

Tested both gRPC and HTTPS. Installed adapter fresh, collected successfully on both. restarted several times. No errors on the collector logged, everything looks great.

@mrsillydog
Copy link
Contributor Author

mrsillydog commented Dec 5, 2024

While testing GRPC, discovered that it had the same issue - we need an integration test or two around this to ensure it doesn't crop up again, but it should be fixed now. Much credit to Dan again.

@mrsillydog mrsillydog merged commit ffe69f0 into release/v1.67.0 Dec 9, 2024
15 checks passed
@mrsillydog mrsillydog deleted the fix/chronicle-zombie-goroutine branch December 9, 2024 15:01
dpaasman00 pushed a commit that referenced this pull request Dec 16, 2024
* Properly shut down chronicleexporter zombie goroutine

* Fix lint

* Fix the same problem for the GRPC workflow
djaglowski pushed a commit that referenced this pull request Dec 16, 2024
* Properly shut down chronicleexporter zombie goroutine

* Fix lint

* Fix the same problem for the GRPC workflow
colelaven pushed a commit that referenced this pull request Dec 17, 2024
* Properly shut down chronicleexporter zombie goroutine

* Fix lint

* Fix the same problem for the GRPC workflow
Caleb-Hurshman pushed a commit that referenced this pull request Dec 17, 2024
* Properly shut down chronicleexporter zombie goroutine

* Fix lint

* Fix the same problem for the GRPC workflow
Caleb-Hurshman pushed a commit that referenced this pull request Dec 17, 2024
* Properly shut down chronicleexporter zombie goroutine

* Fix lint

* Fix the same problem for the GRPC workflow
Caleb-Hurshman pushed a commit that referenced this pull request Dec 17, 2024
* Properly shut down chronicleexporter zombie goroutine

* Fix lint

* Fix the same problem for the GRPC workflow
colelaven pushed a commit that referenced this pull request Dec 18, 2024
* Properly shut down chronicleexporter zombie goroutine

* Fix lint

* Fix the same problem for the GRPC workflow
Caleb-Hurshman added a commit that referenced this pull request Dec 18, 2024
* chore: Update modules to v1.67.0

* fix: QRadar README typo (#2028)

Fix README typo

* fix: Shut down zombie goroutine in chronicleexporter (#2029)

* Properly shut down chronicleexporter zombie goroutine

* Fix lint

* Fix the same problem for the GRPC workflow

* ssapi mvp

* lint

* tls

* WIP

* ticker, other pr feedback

* pagination functionality

* break if results earlier than earliest_time

* fix lint

* check for earliest/latest in query

* config unit tests

* package comment

* feat(chronicleexporter): Support dynamic namespace and ingestion labels  (#1939)

* add namespace and ingenstion logs initial commit

* working except ingestion labels

* ingestion labels from attributes

* use proper log entry batch

* namespace and ingestion logs no config overwrite

* delete OverrideNamespace and OverrideIngestionLabeles

* PR changes

* fix unit tests

* modify tests again

* marshal changes

* readme and namespace check

* debug logs

* rm unnecessary clauses

* fix error wording

* rm space

* wip

* client tests

* checkpoint methods

* WIP

* functional checkpoint

* debug logs, rm print

* loadCheckpoint return error

* splunk failure test

* WIP

* encode req body

* stricter query validation

* storage config test

* lint, tidy

* return error on export fail

* tidy

* receiver tests

* receiver tests

* lint

* fix TestCheckpoint

* rename structs

* exporter fail test

* fix search checkpointing

* auth token

* lint

* fix struct name

* rm prints, fix error messages

* fix tests

* default batch size

* log end of export

* readme

* how-to

* how-to example config

* change how-to conf values

* change test batch size

* fix test case

* fix client test

* fix rebase errors

* tidy

* feat: Enforce request maximum size and number of logs (#2033)

* feat: Enforce request maximum size and number of logs

* Fix lint

* Refactor to be more go-idiomatic

* Update Chronicle exporter readme with new flags

* fix: Delete empty values iterates through nested arrays (#2034)

* delete empty values processor iterates through slices

* log body implementation

* pr review

* initial feedback

* chore: Minor cleanup of chronicle exporter names (#2046)

* chore: Save component.TelemetrySettings on chronicle exporter (#2047)

* chore: Minor cleanup of chronicle exporter names

* chore: Chronicle exporter - save component.TelemetrySettings

* safe shutdown()

* chore: Localize chronicle exporter's metrics concerns (#2048)

chore: Pull metrics-specific concerns into hostMetricsReporter

* rm err checkk from time parsing

* chore: Add debug logging (#2042)

Add debug logging

* chore: Add new tests for chronicle exporter with http and grpc servers (#2049)

* ctx check, doc notes

* chore: Rename to `bindplane-otel-collector` (#2043)

* rename to bindplane-otel-collector

* fix website links

* update report card link

* fix: Shut down zombie goroutine in chronicleexporter (#2029)

* Properly shut down chronicleexporter zombie goroutine

* Fix lint

* Fix the same problem for the GRPC workflow

* chore: Save component.TelemetrySettings on chronicle exporter (#2047)

* chore: Minor cleanup of chronicle exporter names

* chore: Chronicle exporter - save component.TelemetrySettings

* chore: Localize chronicle exporter's metrics concerns (#2048)

chore: Pull metrics-specific concerns into hostMetricsReporter

* chore: Add new tests for chronicle exporter with http and grpc servers (#2049)

* fix: Rebase cleanup (#2063)

rebase cleanup

* chore: separate http and grpc exporters (#2050)

* fix: Shut down zombie goroutine in chronicleexporter (#2029)

* Properly shut down chronicleexporter zombie goroutine

* Fix lint

* Fix the same problem for the GRPC workflow

* ssapi mvp

* initial feedback

* chore: Save component.TelemetrySettings on chronicle exporter (#2047)

* chore: Minor cleanup of chronicle exporter names

* chore: Chronicle exporter - save component.TelemetrySettings

* chore: Localize chronicle exporter's metrics concerns (#2048)

chore: Pull metrics-specific concerns into hostMetricsReporter

* chore: Add new tests for chronicle exporter with http and grpc servers (#2049)

* chore: Save component.TelemetrySettings on chronicle exporter (#2047)

* chore: Minor cleanup of chronicle exporter names

* chore: Chronicle exporter - save component.TelemetrySettings

* chore: Localize chronicle exporter's metrics concerns (#2048)

chore: Pull metrics-specific concerns into hostMetricsReporter

* chore: Add new tests for chronicle exporter with http and grpc servers (#2049)

* fix: Shut down zombie goroutine in chronicleexporter (#2029)

* Properly shut down chronicleexporter zombie goroutine

* Fix lint

* Fix the same problem for the GRPC workflow

* fix rebase stuff

---------

Co-authored-by: Dakota Paasman <[email protected]>
Co-authored-by: Sam Hazlehurst <[email protected]>
Co-authored-by: Ian Adams <[email protected]>
Co-authored-by: Justin Voss <[email protected]>
Co-authored-by: Daniel Jaglowski <[email protected]>
colelaven pushed a commit that referenced this pull request Dec 18, 2024
* Properly shut down chronicleexporter zombie goroutine

* Fix lint

* Fix the same problem for the GRPC workflow
colelaven pushed a commit that referenced this pull request Dec 18, 2024
* Properly shut down chronicleexporter zombie goroutine

* Fix lint

* Fix the same problem for the GRPC workflow
colelaven added a commit that referenced this pull request Dec 19, 2024
* fix: Shut down zombie goroutine in chronicleexporter (#2029)

* Properly shut down chronicleexporter zombie goroutine

* Fix lint

* Fix the same problem for the GRPC workflow

* initial structure & getting headers

* progress

* custom messages sending & receiving properly

* more progress

* more changes

* cleanup

* dont use TopologyInterval, TODO: remove TopologyInterval from BP Extension

* cleanup

* cleanup & tests

* rm print statements

* cleanup, tests, fix tests

* fix bp extension logic, fix lint

* add gatewayid parameter

* fix concurrent map write

* fix tests

* fix test

* cleanup names & data model, add ResourceNameHeader

* fix resource name header

* address pr feedback

* fix lint

* repo rename fixes

* fix rebase issue

* fix gomod versions

* fix go mod

* update topo proc go mod

* fix flaky test

* bump version to 1.68.0

---------

Co-authored-by: Ian Adams <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants