Skip to content

Commit

Permalink
Fix PrototypeRS issue related to termination (pytorch#837)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: pytorch#837

With this PR, PrototypeRS's child processes will be daemon processes such that they terminate when the main one does. Due to the non-deterministic nature of destruction, additional exception handling is required.

Fixes pytorch#806

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D40485829

Pulled By: NivekT

fbshipit-source-id: bd1fcf32956ac5a2abef560dd9a44f1f76e24f45
  • Loading branch information
NivekT authored and ejguan committed Oct 21, 2022
1 parent fa3ec96 commit 9fc699e
Showing 1 changed file with 10 additions and 2 deletions.
12 changes: 10 additions & 2 deletions torchdata/dataloader2/reading_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,7 @@ def initialize(self, datapipe: DataPipe) -> DataPipe:
call_inside_process,
call_on_epoch_reset,
)
process.daemon = True
process.start()
self.processes.append((process, req_queue, res_queue)) # These queues are independent
local_datapipe = communication.iter.QueueWrapper(
Expand Down Expand Up @@ -252,10 +253,17 @@ def clean_me(process, req_queue, res_queue):
# TODO(620): Make termination a function of QueueWrapperDataPipe (similar to reset)
req_queue.put(communication.messages.TerminateRequest())
_ = res_queue.get()
process.join()
process.join(20)

for process, req_queue, res_queue in self.processes:
clean_me(process, req_queue, res_queue)
try:
clean_me(process, req_queue, res_queue)
except AttributeError:
# Due to non-deterministic order of destruction, by the time `finalize` is called,
# some objects may already be `None`.
pass
except TimeoutError:
pass

self.processes = []

Expand Down

0 comments on commit 9fc699e

Please sign in to comment.