Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cmdmod misleading error message #49490

Merged
merged 2 commits into from
Sep 7, 2018
Merged

Fix cmdmod misleading error message #49490

merged 2 commits into from
Sep 7, 2018

Conversation

rares-pop
Copy link
Contributor

@rares-pop rares-pop commented Sep 3, 2018

The cmdmod.py _run function is leaking memory (I don't know exactly why, or where), but here is a reproducing scenario:

  • I tested on a NILinuxRT target which is not overcommiting the memory and not using swap, in order to ensure the real-timeness of the system
  • install smem on the target

Update: this doesn't reproduce on Ubuntu with master/minion on the same machine, the minion registered to 'localhost'. Maybe it's a configuration thing?

cycle these serveral times:

  • run 'salt * cmd.run "opkg list"
  • take a snapshot of the memory using the smem

The memory on salt-minion increases from cycle to cycle until occupies more than 200mb (on a 512MB target) and at a subsequent clone of the process when running cmd.run, it raises an OSError: [Errno 12] Cannot allocate memory.

That investigation will be done on a separate thread, but here we should uncover the root cause of the problem, and not the 'command not found' which makes no sense in this case.

Previous Behavior

Previously, if there was no 'filename' on the exception, it assumed it was a 'command not found' exception, but that's not true. I.e. I'm getting “OSError: [Errno 12] Cannot allocate memory” which has no filename attribute

New Behavior

Add the OSError/IOError strerror message, which for the above is 'Cannot allocate memory', in order not to mislead users about the root cause of what happens.

Tests written?

No

Commits signed with GPG?

Yes

@cachedout
Copy link
Contributor

Very interesting. Do you have a GH issue filed on the memory leak that we can associate this PR with?

@rares-pop
Copy link
Contributor Author

I didn't create a GH issue just yet, because I assumed I'm going to look into that, but I can create one and if you want me to, I will do it tomorrow (take snapshots of memory and everything).

@rares-pop
Copy link
Contributor Author

rares-pop commented Sep 5, 2018

I did try a vanilla salt on NILinuxRT (using both zmq and the tcp transport with the tcp_keepalive options we use) and I cannot reproduce this.

However, as we are having our own beacons and modules and grains, and a windows master, we are seeing and reproducing the leak very easy.

I believe we should still fix the message because is not accurate the way it is.

cmd if output_loglevel is not None else 'REDACTED',
new_kwargs
new_kwargs,
exc.strerror
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally replace this with exc. This will force .format() to stringify it, which will in addition to the error message give us the error code. In addition, it will insulate this except block against potential future issues, since not all exception classes have a strerror attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@terminalmage - okay, I will do that tomorrow morning. I had the same thing in mind, but exc will print: 'OSError: [Errno 12] Cannot allocate memory' as the 'reason', and I saw in official python docs that both OSError and IOError do have the 'strerror' attribute (but it can be None obviously for certain exceptions). Basically both OSError and IOError are constructed from an int and a string, the int being the errno, and the string the explanations for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rares-pop I think it will just show [Errno 12] Cannot allocate memory.

But my main concern here is that someone later adds some other exception class to that except and then causes an AttributeError.

I'm not saying you did anything terribly wrong, just trying to make it a bit more future-proof.

@cachedout
Copy link
Contributor

@rares-pop Yes, I'm all for fixing the leak of course. :) Let's just make sure we get that issue filed so that we can get more eyes on the problem. Thanks!

@rares-pop
Copy link
Contributor Author

@cachedout - as I was mentioning in a previous comment today, I cannot reproduce the memory leak with pure salt (without our custom beacons and grains, modules, and running against a Windows salt-master). Let me try some more to try to reproduce it, maybe the problem is in our own code (not salt related)

@terminalmage
Copy link
Contributor

It looks like the lint check failed to run last time through, we need to get that sorted before we can merge.

@rares-pop
Copy link
Contributor Author

Is it related to this change? Or is a system fault.

I did run pylint --rcfile=.pylintrc salt/modules/cmdmod.py and the things I saw were not coming from the area I touched.

@rallytime
Copy link
Contributor

Some weirdness happened last night with some connections to GitHub it looks like. I'm going to update the branch so this runs on the most recent code. Some of the test builds are missing as well, so some hooks must have gotten dropped somewhere.

@rallytime
Copy link
Contributor

Lint is happy :)

@rallytime rallytime merged commit 774ab94 into saltstack:develop Sep 7, 2018
garethgreenaway added a commit to garethgreenaway/salt that referenced this pull request Sep 18, 2019
@waynew waynew added the has master-port port to master has been created label Feb 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has master-port port to master has been created
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants