Fix cmdmod misleading error message #49490

rares-pop · 2018-09-03T13:37:54Z

The cmdmod.py _run function is leaking memory (I don't know exactly why, or where), but here is a reproducing scenario:

I tested on a NILinuxRT target which is not overcommiting the memory and not using swap, in order to ensure the real-timeness of the system
install smem on the target

Update: this doesn't reproduce on Ubuntu with master/minion on the same machine, the minion registered to 'localhost'. Maybe it's a configuration thing?

cycle these serveral times:

run 'salt * cmd.run "opkg list"
take a snapshot of the memory using the smem

The memory on salt-minion increases from cycle to cycle until occupies more than 200mb (on a 512MB target) and at a subsequent clone of the process when running cmd.run, it raises an OSError: [Errno 12] Cannot allocate memory.

That investigation will be done on a separate thread, but here we should uncover the root cause of the problem, and not the 'command not found' which makes no sense in this case.

Previous Behavior

Previously, if there was no 'filename' on the exception, it assumed it was a 'command not found' exception, but that's not true. I.e. I'm getting “OSError: [Errno 12] Cannot allocate memory” which has no filename attribute

New Behavior

Add the OSError/IOError strerror message, which for the above is 'Cannot allocate memory', in order not to mislead users about the root cause of what happens.

Tests written?

No

Commits signed with GPG?

Yes

cachedout · 2018-09-04T15:08:49Z

Very interesting. Do you have a GH issue filed on the memory leak that we can associate this PR with?

rares-pop · 2018-09-04T15:13:01Z

I didn't create a GH issue just yet, because I assumed I'm going to look into that, but I can create one and if you want me to, I will do it tomorrow (take snapshots of memory and everything).

rares-pop · 2018-09-05T15:03:22Z

I did try a vanilla salt on NILinuxRT (using both zmq and the tcp transport with the tcp_keepalive options we use) and I cannot reproduce this.

However, as we are having our own beacons and modules and grains, and a windows master, we are seeing and reproducing the leak very easy.

I believe we should still fix the message because is not accurate the way it is.

terminalmage · 2018-09-05T15:47:30Z

salt/modules/cmdmod.py

                    cmd if output_loglevel is not None else 'REDACTED',
-                    new_kwargs
+                    new_kwargs,
+                    exc.strerror


I would personally replace this with exc. This will force .format() to stringify it, which will in addition to the error message give us the error code. In addition, it will insulate this except block against potential future issues, since not all exception classes have a strerror attribute.

@terminalmage - okay, I will do that tomorrow morning. I had the same thing in mind, but exc will print: 'OSError: [Errno 12] Cannot allocate memory' as the 'reason', and I saw in official python docs that both OSError and IOError do have the 'strerror' attribute (but it can be None obviously for certain exceptions). Basically both OSError and IOError are constructed from an int and a string, the int being the errno, and the string the explanations for it.

@rares-pop I think it will just show [Errno 12] Cannot allocate memory.

But my main concern here is that someone later adds some other exception class to that except and then causes an AttributeError.

I'm not saying you did anything terribly wrong, just trying to make it a bit more future-proof.

cachedout · 2018-09-05T18:00:50Z

@rares-pop Yes, I'm all for fixing the leak of course. :) Let's just make sure we get that issue filed so that we can get more eyes on the problem. Thanks!

rares-pop · 2018-09-05T18:30:24Z

@cachedout - as I was mentioning in a previous comment today, I cannot reproduce the memory leak with pure salt (without our custom beacons and grains, modules, and running against a Windows salt-master). Let me try some more to try to reproduce it, maybe the problem is in our own code (not salt related)

Signed-off-by: Rares POP <[email protected]>

terminalmage · 2018-09-06T14:53:00Z

It looks like the lint check failed to run last time through, we need to get that sorted before we can merge.

rares-pop · 2018-09-06T15:02:43Z

Is it related to this change? Or is a system fault.

I did run pylint --rcfile=.pylintrc salt/modules/cmdmod.py and the things I saw were not coming from the area I touched.

rallytime · 2018-09-06T15:06:59Z

Some weirdness happened last night with some connections to GitHub it looks like. I'm going to update the branch so this runs on the most recent code. Some of the test builds are missing as well, so some hooks must have gotten dropped somewhere.

…ssage

rallytime · 2018-09-06T15:08:20Z

Lint is happy :)

rallytime requested a review from terminalmage September 4, 2018 13:44

cachedout approved these changes Sep 4, 2018

View reviewed changes

terminalmage reviewed Sep 5, 2018

View reviewed changes

Fix cmdmod misleading error message

c43e09c

Signed-off-by: Rares POP <[email protected]>

terminalmage approved these changes Sep 6, 2018

View reviewed changes

Merge branch 'develop' into dev/iepopr/fix_cmdmod_misleading_error_me…

1aaaac7

…ssage

rallytime merged commit 774ab94 into saltstack:develop Sep 7, 2018

garethgreenaway added a commit to garethgreenaway/salt that referenced this pull request Sep 18, 2019

Porting PR saltstack#49490 to 2019.2.1

25e40e1

garethgreenaway mentioned this pull request Sep 18, 2019

[master] Porting #49490 to master #54543

Closed

waynew added the has master-port port to master has been created label Feb 3, 2020

DmitryKuzmenko mentioned this pull request Jun 25, 2020

Minor logging fix for cmdmod._run #53455

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cmdmod misleading error message #49490

Fix cmdmod misleading error message #49490

rares-pop commented Sep 3, 2018 •

edited

Loading

cachedout commented Sep 4, 2018

rares-pop commented Sep 4, 2018

rares-pop commented Sep 5, 2018 •

edited

Loading

terminalmage Sep 5, 2018

rares-pop Sep 5, 2018

terminalmage Sep 5, 2018

cachedout commented Sep 5, 2018

rares-pop commented Sep 5, 2018

terminalmage commented Sep 6, 2018

rares-pop commented Sep 6, 2018

rallytime commented Sep 6, 2018

rallytime commented Sep 6, 2018

Fix cmdmod misleading error message #49490

Fix cmdmod misleading error message #49490

Conversation

rares-pop commented Sep 3, 2018 • edited Loading

Previous Behavior

New Behavior

Tests written?

Commits signed with GPG?

cachedout commented Sep 4, 2018

rares-pop commented Sep 4, 2018

rares-pop commented Sep 5, 2018 • edited Loading

terminalmage Sep 5, 2018

Choose a reason for hiding this comment

rares-pop Sep 5, 2018

Choose a reason for hiding this comment

terminalmage Sep 5, 2018

Choose a reason for hiding this comment

cachedout commented Sep 5, 2018

rares-pop commented Sep 5, 2018

terminalmage commented Sep 6, 2018

rares-pop commented Sep 6, 2018

rallytime commented Sep 6, 2018

rallytime commented Sep 6, 2018

rares-pop commented Sep 3, 2018 •

edited

Loading

rares-pop commented Sep 5, 2018 •

edited

Loading