Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
shell: fix delay in output flush of single shell rank job
Problem: The rank 0 job shell normally waits for an "eof" message to the shell output.write service in order to know when it can flush pending events in eventlogger and release the shell completion reference for the output plugin. However, this reference counting is skipped when there is only one shell involved in the job. This causes a delay after all tasks output is closed of up to output.batch-timeout seconds (.5s by default). Instead of ignoring the contribution of the rank 0 shell to the output completion reference count, have each local task contribute to the reference count on this shell, and decrement the reference count for these tasks in the task.exit callback. This allows the output plugin to flush output immediately when the last task is complete. Since tasks are not considered complete until they have exited *and* all streams have EOF, this should be perfectly safe.
- Loading branch information