Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log2timeline: multiple issues with multi processing and IPC #169

Closed
joachimmetz opened this issue Apr 17, 2015 · 16 comments
Closed

log2timeline: multiple issues with multi processing and IPC #169

joachimmetz opened this issue Apr 17, 2015 · 16 comments
Assignees
Labels

Comments

@joachimmetz
Copy link
Member

log2timeline: occasionally gets stuck with "Waiting for storage writer".

[INFO] (MainProcess) PID:21754 <multi_process> Waiting for storage writer.

Current challenge is to find a reproducible test case.

@joachimmetz joachimmetz self-assigned this Apr 17, 2015
@joachimmetz joachimmetz changed the title log2timeline: occasionally gets stuck with "Waiting for storage writer". log2timeline: occasionally gets stuck with "Waiting for storage writer" Apr 18, 2015
@joachimmetz
Copy link
Member Author

Foreman triggers on collector being completed, but this should be storage writer.
Some changes to correct this: https://codereview.appspot.com/230960043/

@joachimmetz
Copy link
Member Author

Improve solution:

  • have foreman monitor collector and storage writer process as well

@joachimmetz
Copy link
Member Author

Process appears to still get stuck:

[WARNING] Terminating worker process: Worker_5 (PID: 26016)
[INFO] Waiting for storage writer.
[WARNING] Forcing termination of storage writer (PID: 26006)
[INFO] Storage writer stopped.
[INFO] Processing completed.

@joachimmetz
Copy link
Member Author

It appears multiprocessing.Queue is not FIFO, and its get() method will block if the queue is empty. Some changes to handle this behavior: https://codereview.appspot.com/225680044/

@kiddinn
Copy link
Member

kiddinn commented Apr 26, 2015

Seems to still be an issue for some cases, https://codereview.appspot.com/224610044/ sent in as well, seems to at least fix the issue regarding the test images I had that I could reproduce this on.

@joachimmetz
Copy link
Member Author

Observed behavior:

[INFO] Worker_3 (PID: 16947) - events extracted: 53970 - file: /Windows/System32/config/COMPONENTS - running: True <sleeping>
[INFO] Worker_4 (PID: 16949) - events extracted: 58423 - file: /Windows/System32/drivers/b57nd60a.sys - running: True <sleeping>
[INFO] Worker_5 (PID: 16951) - events extracted: 69976 - file: /Windows/System32/config/SOFTWARE - running: True <sleeping>

Workers remain sleeping for a long time. Some side behavior of the queue changes?

The queue seems to be considered empty before it actually is. More queue changes in: https://codereview.appspot.com/229410043/. I opt to do these first, before CL 224610044, since this could be the root cause for the behavior you're trying to fix in the CL.

@joachimmetz
Copy link
Member Author

Abort paths needs to be checked.

@joachimmetz
Copy link
Member Author

Still an issue if a worker is misbehaving the process does not terminate.

Determine state that all workers are sleeping and no progress is made, then start abort timer.

@joachimmetz
Copy link
Member Author

CL to abort when all workers are idle https://codereview.appspot.com/230620043/. Still to do is to improve the handling of individual workers not responding.

@joachimmetz
Copy link
Member Author

Still an issue with the main process not terminating. The cause is likely the python queue locking: http://stackoverflow.com/questions/21349850/multiprocessing-queue-deadlocks-after-reader-process-death

Possible solution to clean up the IPC: https://codereview.appspot.com/238090043/

@joachimmetz
Copy link
Member Author

Improve "worker idle abort" logic, currently possible that the workers are processing for 5 mins without extracting event objects e.g. when only very specific parsers are enabled. Make sure not to abort prematurely.

Adding a path specs delta should solve this issue: https://codereview.appspot.com/238090043/

@joachimmetz joachimmetz changed the title log2timeline: occasionally gets stuck with "Waiting for storage writer" log2timeline: multiple issues with multi processing and IPC May 9, 2015
@joachimmetz
Copy link
Member Author

Parser error queue disabled for now per: 7853aaf

@joachimmetz
Copy link
Member Author

Testing indicates still an issue on Windows XP, however I do not see the same behavior on Windows 7. Although: https://ci.appveyor.com/project/joachimmetz/plaso/build/72

@joachimmetz
Copy link
Member Author

The abort path is still not always handled cleanly also see: #153

@joachimmetz
Copy link
Member Author

Check handling of no collector input.

@joachimmetz
Copy link
Member Author

Closing this one in favor of #153

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants