Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/fileexporter] add periodic flushing #20675

Merged

Conversation

peterzandbergen
Copy link
Contributor

@peterzandbergen peterzandbergen commented Apr 4, 2023

Description:

Added flushing to the fileExporter to ease live debugging.
Added flush_interval parameter to the Config struct.
The value of flush_interval should be 1 or higher.
If not flushing is disabled.
Flushing occures every flush_interval seconds.

Link to tracking Issue:
Resolves #18251

Testing:
Tests were added to config_test.go, file_exporter_test.go. These passed.
Test were also performed by running the otelcontrib program.

Documentation:
The README.md file has been updated with the new flush_interval setting.

@peterzandbergen peterzandbergen requested review from a team and codeboten April 4, 2023 20:30
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Apr 4, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@djaglowski
Copy link
Member

@peterzandbergen, thanks for contributing this. Please sign the CLA.

We can use #18251 as the tracking issue since this need was noted there as well.

@runforesight
Copy link

runforesight bot commented Apr 4, 2023

Foresight Summary

    
Major Impacts
Foresight hasn't detected any major impact on your workflows and tests.

View More Details

⭕  build-and-test-windows workflow has finished in 9 seconds (30 minutes 23 seconds less than main branch avg.) and finished at 13th Apr, 2023.


Job Failed Steps Tests
windows-unittest-matrix -     🔗  N/A See Details
windows-unittest -     🔗  N/A See Details

✅  telemetrygen workflow has finished in 6 minutes 21 seconds (⚠️ 5 minutes 1 second more than main branch avg.) and finished at 13th Apr, 2023.


Job Failed Steps Tests
build-dev -     🔗  N/A See Details
publish-latest -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details

✅  prometheus-compliance-tests workflow has finished in 8 minutes 44 seconds (⚠️ 2 minutes 19 seconds more than main branch avg.) and finished at 13th Apr, 2023.


Job Failed Steps Tests
prometheus-compliance-tests -     🔗  N/A See Details

✅  changelog workflow has finished in 8 minutes 44 seconds (⚠️ 6 minutes 28 seconds more than main branch avg.) and finished at 13th Apr, 2023.


Job Failed Steps Tests
changelog -     🔗  N/A See Details

✅  check-links workflow has finished in 10 minutes 51 seconds (⚠️ 9 minutes 55 seconds more than main branch avg.) and finished at 13th Apr, 2023.


Job Failed Steps Tests
changed files -     🔗  N/A See Details
check-links -     🔗  N/A See Details

✅  load-tests workflow has finished in 16 minutes 3 seconds (⚠️ 5 minutes 27 seconds more than main branch avg.) and finished at 13th Apr, 2023.


Job Failed Steps Tests
setup-environment -     🔗  N/A See Details
loadtest (TestIdleMode) -     🔗  N/A See Details
loadtest (TestBallastMemory|TestLog10kDPS) -     🔗  N/A See Details
loadtest (TestMetric10kDPS|TestMetricsFromFile) -     🔗  N/A See Details
loadtest (TestMetricResourceProcessor|TestTrace10kSPS) -     🔗  N/A See Details
loadtest (TestTraceNoBackend10kSPS|TestTrace1kSPSWithAttrs) -     🔗  N/A See Details
loadtest (TestTraceBallast1kSPSWithAttrs|TestTraceBallast1kSPSAddAttrs) -     🔗  N/A See Details
loadtest (TestTraceAttributesProcessor) -     🔗  N/A See Details

✅  e2e-tests workflow has finished in 17 minutes 37 seconds (⚠️ 3 minutes 33 seconds more than main branch avg.) and finished at 13th Apr, 2023.


Job Failed Steps Tests
kubernetes-test (v1.26.0) -     🔗  N/A See Details
kubernetes-test (v1.25.3) -     🔗  N/A See Details
kubernetes-test (v1.24.7) -     🔗  N/A See Details
kubernetes-test (v1.23.13) -     🔗  N/A See Details

✅  build-and-test workflow has finished in 48 minutes 54 seconds and finished at 13th Apr, 2023.


Job Failed Steps Tests
govulncheck -     🔗  N/A See Details
setup-environment -     🔗  N/A See Details
check-collector-module-version -     🔗  N/A See Details
check-codeowners -     🔗  N/A See Details
checks -     🔗  N/A See Details
integration-tests -     🔗  N/A See Details
lint-matrix (receiver-0) -     🔗  N/A See Details
lint-matrix (receiver-1) -     🔗  N/A See Details
lint-matrix (processor) -     🔗  N/A See Details
lint-matrix (exporter) -     🔗  N/A See Details
lint-matrix (extension) -     🔗  N/A See Details
lint-matrix (connector) -     🔗  N/A See Details
lint-matrix (internal) -     🔗  N/A See Details
lint-matrix (other) -     🔗  N/A See Details
unittest-matrix (1.20, receiver-0) -     🔗  N/A See Details
unittest-matrix (1.20, receiver-1) -     🔗  N/A See Details
unittest-matrix (1.20, processor) -     🔗  N/A See Details
unittest-matrix (1.20, exporter) -     🔗  N/A See Details
unittest-matrix (1.20, extension) -     🔗  N/A See Details
unittest-matrix (1.20, connector) -     🔗  N/A See Details
unittest-matrix (1.20, internal) -     🔗  N/A See Details
unittest-matrix (1.20, other) -     🔗  N/A See Details
unittest-matrix (1.19, receiver-0) -     🔗  N/A See Details
unittest-matrix (1.19, receiver-1) -     🔗  N/A See Details
unittest-matrix (1.19, processor) -     🔗  N/A See Details
unittest-matrix (1.19, exporter) -     🔗  N/A See Details
unittest-matrix (1.19, extension) -     🔗  N/A See Details
unittest-matrix (1.19, connector) -     🔗  N/A See Details
unittest-matrix (1.19, internal) -     🔗  N/A See Details
unittest-matrix (1.19, other) -     🔗  N/A See Details
build-examples -     🔗  N/A See Details
correctness-metrics -     🔗  N/A See Details
correctness-traces -     🔗  N/A See Details
unittest (1.20) -     🔗  N/A See Details
unittest (1.19) -     🔗  N/A See Details
lint -     🔗  N/A See Details
cross-compile (darwin, amd64) -     🔗  N/A See Details
cross-compile (darwin, arm64) -     🔗  N/A See Details
cross-compile (linux, 386) -     🔗  N/A See Details
cross-compile (linux, amd64) -     🔗  N/A See Details
cross-compile (linux, arm) -     🔗  N/A See Details
cross-compile (linux, arm64) -     🔗  N/A See Details
cross-compile (linux, ppc64le) -     🔗  N/A See Details
cross-compile (windows, 386) -     🔗  N/A See Details
cross-compile (windows, amd64) -     🔗  N/A See Details
build-package (deb) -     🔗  N/A See Details
build-package (rpm) -     🔗  N/A See Details
windows-msi -     🔗  N/A See Details
publish-check -     🔗  N/A See Details
publish-dev -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details
rotate-milestone -     🔗  N/A See Details

🔎 See details on Foresight

*You can configure Foresight comments in your organization settings page.

@peterzandbergen
Copy link
Contributor Author

CLA is signed

exporter/fileexporter/file_exporter.go Outdated Show resolved Hide resolved
exporter/fileexporter/file_exporter.go Outdated Show resolved Hide resolved
exporter/fileexporter/file_exporter.go Outdated Show resolved Hide resolved
exporter/fileexporter/file_exporter.go Outdated Show resolved Hide resolved
exporter/fileexporter/testdata/config.yaml Outdated Show resolved Hide resolved
exporter/fileexporter/config.go Outdated Show resolved Hide resolved
@peterzandbergen
Copy link
Contributor Author

peterzandbergen commented Apr 8, 2023

Hi Daniel and Iook (I hope this is your first name),
Thanks for your comments and suggestions for improvement. I have incorporate them all.
I have also improved the way the goroutine stops.

Changes

  • Change flush_interval to time.Duration
  • Change startFlusher to a receiver method
  • Moved stopFlusher logic to Stop method
  • Move test on flush_interval to Start method
  • Remove superfluous tests
  • Add stopFlusher channel to ensure that the flushing goroutine
    stops correctly

@peterzandbergen
Copy link
Contributor Author

Hi, can you let me know if I need to perform more changes? Cheers

@djaglowski
Copy link
Member

Thanks for making this contribution @peterzandbergen. The code looks good to me and I've verified manually that it works as expected.

There are two things I think we need to consider yet.

  1. Should we enable this functionality by default? I think probably yes, as this component is commonly used for manual testing and the lack of automatic flushing often causes confusion. I don't see any notable downsides to having an unaggressive flush interval. I think 1s would be reasonable. WDYT? Perhaps @atingchen, do you have an opinion on this?
  2. It's not clear to me that any of the tests are actually validating this functionality. I think we should have a test that ensures the flush occurs. Apologies if I missed this but I couldn't find it.

@peterzandbergen
Copy link
Contributor Author

Hi @djaglowski,
thanks for your comment.

ad 1) I agree that a default flush interval makes sense. When I started using the fileexporter I thought I was doing something wrong when I saw no output. A default of 1 second sounds reasonable.

ad 2) I will think of a test to validate if flushing works.

@atingchen
Copy link
Contributor

  1. Should we enable this functionality by default? I think probably yes, as this component is commonly used for manual testing and the lack of automatic flushing often causes confusion. I don't see any notable downsides to having an unaggressive flush interval. I think 1s would be reasonable. WDYT? Perhaps @atingchen, do you have an opinion on this?

I think this sounds like a good idea.

@peterzandbergen
Copy link
Contributor Author

I have added the default setting of 1 second for the flushing_interval and added it to the config_test.
I have added a test to check if flushing works.
I have also added buffering to the lumberjack logging. Let me know if this is a a good idea.

All tests pass.

@peterzandbergen peterzandbergen force-pushed the feature/fileexporter-flush branch from 9e1fe41 to 84fbf38 Compare April 12, 2023 13:18
@peterzandbergen
Copy link
Contributor Author

Hi Daniel and Luke, thanks for your comments on the PR.
I will be in Amsterdam on KubeCon next week. Will you be there too? Would be a good opportunity to meet outside cyberspace.
Cheers

@djaglowski
Copy link
Member

@peterzandbergen, I'll be there and look forward to connecting.

@atingchen
Copy link
Contributor

@peterzandbergen Thank you very much for your invitation. Unfortunately, I won't be able to make it to the event.Hope you have a happy memory.

@peterzandbergen
Copy link
Contributor Author

@atingchen Hi Luke, it was great working together. Let's do it again sometime.

@peterzandbergen
Copy link
Contributor Author

peterzandbergen commented Apr 12, 2023

I am adding commits to fix race errors in testing.
make -C exporter/fileexporter/ test now passes.

@peterzandbergen
Copy link
Contributor Author

@djaglowski Hi, I see that you have approved it. I is now waiting for another reviewer. Is there something I need to provide for it to be approved?
I am asking because I am not familiar with the whole process.
Cheers

BTW, are you attending KubeCon as a speaker or does your company have a booth?

@djaglowski
Copy link
Member

@peterzandbergen, thanks for resolving the race condition. There was one more lint issue but I believe I've resolved it. I'll reach out on CNCF Slack re KubeCon.

@atingchen, do you want to give this a final review?

@peterzandbergen
Copy link
Contributor Author

peterzandbergen commented Apr 13, 2023

@djaglowski @atingchen Dan and Luke,
I want to thank you for your cooperation and patience in finishing this PR.
I was not familiar with the way the project was set up, in particular I needed to learn about the different options for make for lint and test and to localize this to the exporter/fileexporter directory. This would have saved Github actions costs and time and effort on your side.
All in all it was a good experience and it encourages me to contribute more.
Again, thanks.

@peterzandbergen
Copy link
Contributor Author

One last thing, I hope. There is no check on a negative duration for flush_interval, it will result in panic when the ticker is created. I suggest to return an error in this case and document it.

@atingchen
Copy link
Contributor

@peterzandbergen, thanks for resolving the race condition. There was one more lint issue but I believe I've resolved it. I'll reach out on CNCF Slack re KubeCon.

@atingchen, do you want to give this a final review?

It looks good to me, but I'm not authorized to approved it.

@EdmundMartin
Copy link

Is anyone actively looking at this to provide the second review? Found the fact that the file exporter doesn't periodically flush to disk quite surprising and this PR would be a good fit for my use case. Would love to see this get merged in!

Is there anything that can be done to get this change over the line?

@djaglowski
Copy link
Member

@EdmundMartin, this will be merged as soon as CI passes. We seem to have an unrelated flaky integration test standing in the way.

peterzandbergen and others added 16 commits April 25, 2023 11:05
Add flush_interval in seconds to the fileexporter configuration
and use this parameter to periodically flush the buffer to the file.

Flushing eases debugging because the output is shown at regularly
intervals.

Not setting the flush_interval or setting it to a value less than 1
results in the old behavior.
- Change flush_interval to time.Duration
- Change startFlusher to a receiver method
- Moved stopFlusher logic to Stop method
- Move test on flush_interval to Start method
- Remove superfluous tests
- Add stopFlusher channel to ensure that the flushing goroutine
  stops correctly
Also
- set flush_interval to 1s if not set
- use buffered writer for rotation logger
- add flush interval test
- Set expected default value in config_test
- Add unwrapping to the rotation settings test
- Reuse the NopWriterCloser instead of adding my version
- lint complained about bad formatting
- add mutex to startFlushing to avoid race condition
- add mutex lock/unlock to shutdown
- add safeFileExporterWrite with mutex lock/unlock
@djaglowski djaglowski force-pushed the feature/fileexporter-flush branch from b11103a to b64b024 Compare April 25, 2023 15:05
@djaglowski djaglowski merged commit 4161191 into open-telemetry:main Apr 25, 2023
@github-actions github-actions bot added this to the next release milestone Apr 25, 2023
@djaglowski
Copy link
Member

Thanks for the contribution @peterzandbergen!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[exporter/file] Add buffer flush interval
4 participants