Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perl multihtreading #371

Closed
keyser75000 opened this issue Sep 4, 2017 · 9 comments
Closed

Perl multihtreading #371

keyser75000 opened this issue Sep 4, 2017 · 9 comments

Comments

@keyser75000
Copy link
Contributor

Hi,

Sorry to reopen previous cases about Multithreading but the last one was closed without fixing the problem.
old cases : #345, #173

A quick summary:
After a few inventory cycles, the Fusioninventory agent in service mode generates errors such as:
unexpected error in FusionInventory::Agent::Task::Inventory::Win32::[MODULE]: Thread already detached at C:\Program Files\FusionInventory-Agent/perl/agent/FusionInventory/Agent/Tools/Win32.pm line 756.

  • Only in service mode (as it stays in memory) on somme servers (nothing common).
  • With all agents post 2.3.17 (multithreads). Even on 2.3.21.
  • Error is only displayed in debug mode in logs.
  • First Module impacted can be different between servers (often Softwares) but always the same on same server.
  • This "error" spreads to others modules on next cycles.
  • Very annoying for Softwares since it inventory only a few and uninstalls others.

Looking at Perl::Threads documentation I noticed that :

calling ->join() or ->detach() on an already joined thread will cause an error to be thrown.

I search in code calls to this and I found in ./Tools/Win32.pm that "_call_win32_ole_dependent_api" try to detach without testing thread.
I add a test :

# Worker is failing: get back to mono-thread and pray
+ if ($worker->is_joignable()) {
      $worker->detach();
+}

After a few days of testing it seems to work as expected on servers that had the problem (sometimes after only 2 or 3 inventory cycles : 4/6 hours).
I don't know if it could have side effects.
Could you check if I'm right and eventually push the patch ?

Thanks

@g-bougard
Copy link
Contributor

Hi @keyser75000
I'm okay with your test to avoid the error. By the way, this is still a problem to reach this point in the code. This means something has crashed in a thread and it would be great to know why. Have you any other log message which could help to better understand the underlying problem ?

One more question, about concerned computers, are they often under heavy load ? I would like to know if computer load can be related.
Thank you

@keyser75000
Copy link
Contributor Author

Hi,

The only other messages I find, on few servers, is concerning timeouts on some modules.
Mostly Controllers, sometime Printers or USB :
FusionInventory::Agent::Task::Inventory::Win32::Controllers killed by a timeout
(after more than 3 minutes).

I follow processes while launching FI inventory. I noticed sometime that WMIPrvSE.exe process is on high CPU load (WMI Provider Host) but only on timeouts-modules-servers.
Those (the most) with error on Software Module have low CPU load.
No relation in my opinion.

@g-bougard
Copy link
Contributor

Hi @keyser75000
finally you may be concerned by WMI issues like was discussed in #398. I think PR #406 may help you to better handle the timeout in such context or eventually you can raise backend-collect-timeout parameter to something like 5 or 10 minutes, just to be greater than the time Controllers, Printers & USB module may take.

g-bougard added a commit to TECLIB/fusioninventory-agent that referenced this issue Nov 24, 2017
@g-bougard g-bougard added this to the 2.4 milestone Nov 24, 2017
@g-bougard
Copy link
Contributor

Hi @keyser75000
can you give a chance to just released fusioninventory agent win32 installer 2.4-rc2 and tell us if your issue is now fixed ?
Thank you

@keyser75000
Copy link
Contributor Author

keyser75000 commented Nov 29, 2017

Hi,
thx I'll try to test it asap.
Hard to do, we have patched all our computers (more than 5500). I must find an old server that has this issue in few inventory cycle.
Concerning others issues you spoke upper (#398 #406), I'm not really concern. It only slows inventory on few computers. High CPU load is in fact an issue between WMI Provider Host and our AV. It appears too with SCOM agents (which use WMI).

I'll keep you in touch for rc2 tests.

g-bougard added a commit that referenced this issue Dec 7, 2017
* Make getWMIObjects API doing Async queries
  So it can now really timeout (timeout is set to 180s by dfault)
* Little refacto on Win32::Users inventory to make it compatible with async queries
* Use expirationTime defined from runFunction() API timeout to expire getWMIObjects() calls.
  During Inventory task, this aligns the timeout to backend-collect-timeout defined in configuration.
  Delay between EventSink checks to 200 ms.
* Include fix for #371 if worker thread has crashed
* Don't update timeout to decide worker is failing
  but prefer check current time is older than timeout
  setting cond_timedwait later by a second
* Handle task expiration time API in a dedicated Tools module
@g-bougard
Copy link
Contributor

Hi @keyser75000
#406 is now merged and includes a control on thread if it is still detached or not in case of failure as you proposed in your first post here.
Then I'm considering this issue is closed.
Feel free to open a new issue which would then be rebased on 2.4-rc2 and reference this issue if you find any other trouble.
Thank you for your support.

@g-bougard
Copy link
Contributor

Hi @keyser75000

as Fusioinventory Agent 2.4 is just release, ​can you test with it ?

Thank you

@keyser75000
Copy link
Contributor Author

Hi,
Happy new year :)

Ok I'll test it and I keep you in touch by next week (time to generate potentially error on few servers).
In fact I tested rc2 two weeks ago, it seemed ok for me but case was closed.

@keyser75000
Copy link
Contributor Author

Hi,

It's OK on new release 2.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants