Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traces not flushing on process exit #708

Closed
aheuermann opened this issue Oct 8, 2019 · 9 comments
Closed

Traces not flushing on process exit #708

aheuermann opened this issue Oct 8, 2019 · 9 comments
Labels
community core feature-request question Further information is requested

Comments

@aheuermann
Copy link

Describe the bug
We are not seeing traces for our short-lived cron jobs we have added tracing to. These are kubernetes crons. A container is spun up specifically to run the job and then exits once complete or fails. I enabled debug and noticed that traces are being encoded but don't seem to be sent to the agent as I do not see any Request to the agent log messages. See logs below:

2019-10-08T21:30:12.582Z: Initializing tracer for service: jobs
2019-10-08T21:30:13.213Z: Starting Job
2019-10-08T21:30:13.364Z: Publishing messages ...
2019-10-08T21:30:13.844Z: Done publishing messages ...
2019-10-08T21:30:13.845Z: Job Complete
2019-10-08T21:30:15.950Z: "Encoding trace: [{""trace_id"":""4477388297506770617"",""span_id"":""972637913971971208"",""parent_id"":""4477388297506770617"", ...
2019-10-08T21:30:15.950Z: Adding encoded trace to buffer: ...
2019-10-08T21:30:15.950Z: Encoding trace: [{""trace_id"":""4477388297506770617"",""span_id"":""5820669086732705198"",""parent_id"":""4477388297506770617"", ...
2019-10-08T21:30:15.950Z: Adding encoded trace to buffer: ...
2019-10-08T21:30:15.950Z: Encoding trace: [{""trace_id"":""4477388297506770617"",""span_id"":""389139455014874781"",""parent_id"":""4477388297506770617"", ...
2019-10-08T21:30:15.950Z: Adding encoded trace to buffer: ...
2019-10-08T21:30:15.950Z: Encoding trace: [{""trace_id"":""4477388297506770617"",""span_id"":""1598632403633540690"",""parent_id"":""4477388297506770617"", ...
2019-10-08T21:30:15.950Z: Adding encoded trace to buffer: ...
2019-10-08T21:30:15.950Z: Encoding trace: [{""trace_id"":""4477388297506770617"",""span_id"":""6502698786863096"",""parent_id"":""4477388297506770617"", ...
2019-10-08T21:30:15.950Z: Adding encoded trace to buffer: ...
2019-10-08T21:30:15.951Z: Encoding trace: [{""trace_id"":""7913298096152371072"",""span_id"":""1427519210819598087"",""parent_id"":""5603003085468264444"", ...
<END OF LOGS>

Should the tracer be automatically flushing traces on process exit or is there a way that I can manually flush the traces to the agent upon exit?

Environment

  • Operation system:
    Debian
  • Node version:
    12.10.0
  • Tracer version:
    0.15.3
  • Agent version:
@aheuermann aheuermann added the bug Something isn't working label Oct 8, 2019
@aheuermann
Copy link
Author

Note: we are seeing the same thing on our longer-running crons as well so I don't think that the fact that this ran < 1s has anything to do with it.

@aheuermann
Copy link
Author

Ah I think I may have figured it out. It looks like the beforeExit event is what triggers the flush, but I am calling process.exit() to indicate pass/fail of the job which explicitly skips beforeExit hooks.

Would it be possible to expose a way to manually flush the traces to the agent? Or maybe there is another solution?

@rochdev rochdev added community core feature-request question Further information is requested and removed bug Something isn't working labels Oct 10, 2019
@rochdev
Copy link
Member

rochdev commented Oct 10, 2019

The problem with process.exit() is that it forcefully exits the process, effectively skipping any asynchronous work such as HTTP requests. This means that even if you were to manually trigger the flush, process.exit() may still exit before the trace is actually flushed. I'm open to suggestions, but I'm not sure it's possible to prevent that without altering the behavior of process.exit().

Is there a way you could change the exit code at runtime and let the process exit gracefully?

@aheuermann
Copy link
Author

We can update to exit the process gracefully, but I think it is fairly common to exit with process.exit so might be worth allowing the manual flush option as well. Not sure how feasible it is to expose a flush method that returns a promise that is resolved once all traces are sent to the agent? Let me know what you think.

@rochdev
Copy link
Member

rochdev commented Oct 11, 2019

Exposing the method would not be difficult, and a similar behavior can be achieved by setting the flushInterval option to 0 which results in the trace being flushed immediately after it finishes. However, this will still cause a race condition since the actual work happens asynchronously and will be forcefully aborted by the call to process.exit(). Basically, process.exit() means "abort everything and stop the process now" so it doesn't wait for any operation to have completed before exiting. While it may appear to work in some cases depending on how Node schedules the work internally, the behavior cannot be relied on since the construct is explicitly meant to not wait before stopping.

In short, process.exit() is designed to immediately stop the process and doesn't allow asynchronous finalization code, so this is not something we can really handle.

From the Node documentation: Listener functions must only perform synchronous operations. The Node.js process will exit immediately after calling the 'exit' event listeners causing any additional work still queued in the event loop to be abandoned.

@rochdev
Copy link
Member

rochdev commented Oct 7, 2020

Closing since this is a limitation of how Node handles exit events.

@rochdev rochdev closed this as completed Oct 7, 2020
@vecerek
Copy link
Contributor

vecerek commented Oct 9, 2020

I'm experiencing the same issue.

Is there a way you could change the exit code at runtime and let the process exit gracefully?

@rochdev I tried to just return and set an exit code instead of actually exiting forcefully, i.e.

process.exitCode = 1; // instead of process.exit(1)

This fixed the issue of dd-trace not being able to flush the traces but it also resulted in the process not exiting at all.

@aheuermann have you found a workaround? What I ended up doing was calling a delayed process.exit once all processing was done by the app. I found that 5 seconds is enough time for dd-trace to flush the traces but I can't be sure if that holds up every time 🤷 I'm open to suggestions on how to actually shutdown gracefully.

setTimeout(process.exit, 5000);

@rochdev
Copy link
Member

rochdev commented Oct 9, 2020

This fixed the issue of dd-trace not being able to flush the traces but it also resulted in the process not exiting at all.

@vecerek Sounds like a bug, can you open a new issue that the tracer is preventing the process from exiting?

@fredefox
Copy link

From what I understand, what @aheuermann is requesting is a way to access the flush method allowing code to do something along the lines of:

const tracer = DD.init();
const res = await App.run(tracer)
await tracer.flush();
process.exit(res);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community core feature-request question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants