Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process path and name properties are broken #111

Closed
giampaolo opened this issue May 23, 2014 · 30 comments
Closed

Process path and name properties are broken #111

giampaolo opened this issue May 23, 2014 · 30 comments

Comments

@giampaolo
Copy link
Owner

From g.rodola on September 21, 2010 21:51:19

Process.path is supposed to be an absolute directory name but this is not always True.
The code below demonstrates the problem.

import psutil, os
for p in psutil.process_iter():
    if p.path:
        if not os.path.isdir(p.path):
            print "%s - %s" % (p.name, repr(p.path))

'udisks-daemon: polling /dev'
'hald-addon-input: Listening on /dev/input/event3 /dev/input/event0 /dev/input'
'hald-addon-storage: polling /dev'
'hald-addon-acpi: listening on acpid socket /var/run'
'/opt/google/chrome/chrome --type=extension --lang=it 
--force-fieldtest=CacheSize/CacheSizeGroup_2/ConnCountImpact/_conn_count_6/DnsImpact/_default_enabled_prefetch/GlobalSdch/_global_enable_sdch/IdleSktToImpact/_idle_timeout_10/SpdyImpact/_npn_with_spdy'

'/opt/google/chrome/chrome --type=renderer --lang=it 
--force-fieldtest=CacheSize/CacheSizeGroup_2/ConnCountImpact/_conn_count_6/DnsImpact/_default_enabled_prefetch/GlobalSdch/_global_enable_sdch/IdleSktToImpact/_idle_timeout_10/SpdyImpact/_npn_with_spdy'

'/opt/google/chrome/chrome --type=renderer --lang=it 
--force-fieldtest=CacheSize/CacheSizeGroup_2/ConnCountImpact/_conn_count_6/DnsImpact/_default_enabled_prefetch/GlobalSdch/_global_enable_sdch/IdleSktToImpact/_idle_timeout_10/SpdyImpact/_npn_with_spdy'

'/opt/google/chrome/chrome --type=renderer --lang=it 
--force-fieldtest=CacheSize/CacheSizeGroup_2/ConnCountImpact/_conn_count_6/DnsImpact/_default_enabled_prefetch/GlobalSdch/_global_enable_sdch/IdleSktToImpact/_idle_timeout_10/SpdyImpact/_npn_with_spdy'

'/opt/google/chrome/chrome --type=renderer --lang=it 
--force-fieldtest=CacheSize/CacheSizeGroup_2/ConnCountImpact/_conn_count_6/DnsImpact/_default_enabled_prefetch/GlobalSdch/_global_enable_sdch/IdleSktToImpact/_idle_timeout_10/SpdyImpact/_npn_with_spdy'

'/opt/google/chrome/chrome --type=renderer --lang=it 
--force-fieldtest=CacheSize/CacheSizeGroup_2/ConnCountImpact/_conn_count_6/DnsImpact/_default_enabled_prefetch/GlobalSdch/_global_enable_sdch/IdleSktToImpact/_idle_timeout_10/SpdyImpact/_npn_with_spdy'

'/opt/google/chrome/chrome --type=renderer --lang=it 
--force-fieldtest=CacheSize/CacheSizeGroup_2/ConnCountImpact/_conn_count_6/DnsImpact/_default_enabled_prefetch/GlobalSdch/_global_enable_sdch/IdleSktToImpact/_idle_timeout_10/SpdyImpact/_npn_with_spdy'




This comes from Linux but I suspect we might have the same issue on 
other platforms as well.
This is what happens on Linux:

- (psutil._pslinux.py) os.readlink("/proc/%s/exe" %pid) is 
attempted and fails for permission errors
- (psutil._pslinux.py) open("/proc/%s/stat" %pid) is used to 
get the process name and path is left empty
_ (psutil.__init__.py) being path empty it gets set as follows: 
        if cmdline and not path:
            self.path = os.path.dirname(cmdline[0])

I'm still not sure what's the best place to fix this problem.
I think the best way to proceed is adding a test case to figure out 
on what platforms this is broken and then decide what to do from there.

Original issue: http://code.google.com/p/psutil/issues/detail?id=111

@giampaolo giampaolo self-assigned this May 23, 2014
@giampaolo
Copy link
Owner Author

From [email protected] on September 21, 2010 13:01:56

Note that path can also be empty for some processes. For example if 
you run a command that's in the PATH from the shell, it doesn't 
populate a path in ps/top

@giampaolo
Copy link
Owner Author

From g.rodola on September 21, 2010 13:16:38

Test committed in r638 .
It is also broken on FreeBSD.
Failures look like this:


======================================================================
FAIL: test_path (__main__.TestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_psutil.py", line 300, in test_path
    % (repr(p.path), p.pid, p.name, p.cmdline))
AssertionError: 'sendmail: Queue runner@00:30:00 for /var/spool' is 
not a directory (pid=612, name=clientmqueue, cmdline=['sendmail: 
Queue runner@00:30:00 for /var/spool/clientmqueue'])

Removoing these lines from __init__.ProcessInfo class should solve 
the problem leaving the path just empty:

        # if we have the cmdline but not the path, figure it out from argv[0]
        if cmdline and not path:
            self.path = os.path.dirname(cmdline[0])

@giampaolo
Copy link
Owner Author

From [email protected] on September 21, 2010 13:48:06

We've discussed this before... removing those lines means we'll run 
into other problems. The "comm" variable in the proc_info struct is 
limited length so will not contain the entire comand name or path. 
Hence why we use the cmdline[0] to get the path. 

What we can do is check to see if os.path.dirname(cmdline[0]) is a 
valid path (or empty). If not, then we can leave path alone since 
dirname() is producing non-sensible results.

@giampaolo
Copy link
Owner Author

From [email protected] on September 22, 2010 06:17:50

One more thing I thought of - we may end up with processes started 
with partial paths or local ones like "./myscript.sh" or 
"dir/subdir/scripts/myscript.sh" which won't be absolute paths. We'll 
have to make sure our tests take that into account as well.

@giampaolo
Copy link
Owner Author

From g.rodola on September 22, 2010 08:18:34

I added a further test in r639 which tests the absolute executable 
path resulting from os.path.join(path, name).
It fails also if we first check that Process.path is a valid path (as 
you proposed) because Process.name is NOT the actual name of the executable.
This happens because name is erroneously assumed from the cmdline ( 
r633 ), hence also Pricess.name appears to be broken.

@giampaolo
Copy link
Owner Author

From [email protected] on September 22, 2010 08:56:52

Ok, and that brings us back around to the p_comm issue where the 
command name can be truncated. I'm not sure what to recommend in that 
case, we could check to see if basname(cmdline[0]).startswith(name) 
and if it does, we can overwrite the name with the cmdline[0] 
basename in case p_comm was truncated. Otherwise we can just use 
p_comm as is and hope it didn't get truncated. 

Thoughts?

@giampaolo
Copy link
Owner Author

From g.rodola on September 22, 2010 13:48:44

Ok, I think I got this more or less figure out, at least on Linux.

= 1 - Process.name =

First of all, the concept of getting the complete executable name by 
doing os.path.join(proc.path, proc.name) is wrong because proc.name 
is NOT the base name of the executable living somewhere in the 
*filesystem* (e.g. "python" from "/usr/bin/python") but just a 
generic name assigned from the kernel. 
Note that by using https://code.google.com/p/procname/ I verified 
that this is NOT the process title as set by prctl() system call.
The fact that this is a generic name assigned from the kernel appears 
to be clear when you read /proc/{PID}/stat.
Some process names extracted from there look like "kblockd/0", 
"kblockd/0" etc... and none of them can be found under /bin, /usr/bin, etc.

Summary: for no reason Process.name should be used to build the 
complete executable name, neither it should be determined from the 
cmdline as introduced by r633 .
I don't know if also OSX and BSD have this same concept of "name" as 
intended on Linux; in any case I expect it to be determined from the 
kernel and provided as-is on all UNIX systems. 
On Windows it's just ok for it to be the base executable name (e.g. "python.exe").

= 2 - Process.path =

I got two considerations here: one for the returned value and another 
one about how we retrieve this.
In the first place I think it's pointless to return the *directory 
name of the executable* (why would I want to know that?) and that the 
absolutized executable path should just be returned instead.
This saves us from doing os.path.join(p.path, p.name) to get the 
complete exe path which, aside from being wrong (as said above), is 
also tedious, and the complete exe name is what we want most of the times anyway.

The executable path on Linux is retrieved by reading 
"/proc/{PID}/exe". If for some reason (the most common: limited 
permissions) this can't be determined then it's ok to guess it from 
the command line (cmdline.split(' ')[0]) but we have to make sure that:

- the path exists and is a file
- the file is executable

Summary: make Process.path return "/usr/bin/exename" instead of 
"/usr/bin" and avoid blind guessing.


Comments are welcome.

@giampaolo
Copy link
Owner Author

From g.rodola on September 22, 2010 13:56:17

> Summary: make Process.path return "/usr/bin/exename" instead of 
"/usr/bin" and avoid 
> blind guessing.

Further note: if we want it to be this way then it would be better to 
rename it in "exe" and make "path" an alias for "exe" raising a deprecation warning.

@giampaolo
Copy link
Owner Author

From g.rodola on September 22, 2010 15:45:33

The patch in attachment is for the Linux implementation.
I took care of the test suite as well by renaming all "Process.path" 
references to "Process.exe" and test_path() test (now called 
test_exe()) reflects and expects all the changes I described above 
(this should ease the fix on other platforms).

Attachment: linux.patch

@giampaolo
Copy link
Owner Author

From g.rodola on September 22, 2010 16:04:44

Summary: Process path and name properties are broken

@giampaolo
Copy link
Owner Author

From [email protected] on September 22, 2010 17:18:43

I'm not 100% sure I followed your points above, but here's where I'm 
at, maybe that will help determine if we're on the same page: 

1) NAME

On UNIX-like systems other than Linux where you have /proc, this is 
represented by p_comm in the kproc_info structure. This p_comm 
variable is the "command name" and may be truncated. It does NOT 
include the path, so /usr/bin/vim becomes just "vim".

Since a truncated command name is not very useful, I propose using 
the basename of cmdline[0] - if we're concerned about this returning 
incorrect results, we can check to see if the basename(cmdline[0]) 
starts with the contents of p_comm, to verify that cmdline[0] is the 
non-truncated version of the command name. If it's not, we can simply 
use p_comm as is and live with the 16 byte limitation.

2) PATH

I think we need to consider this carefully and decide what we want 
"path" to represent. There are conflicting/different concepts at play 
here on the various platforms: 

* Linux: 'exe' in /proc links to the absolute path of the executable 
launched, 'cmdline' gives the command line or in some cases a labeled name.

* OS X/BSD: kproc_info kp_proc struct has a p_comm field 16 bytes 
long that stores the process name. The commandline/arguments contains 
as the first argument the process itself that was called -  argv[0] 
in other words. I'm not sure if setting the proctitle with prctl 
affects p_comm contents or not.

* Windows: process name is retrieved from toolhelpsnapshot, with no 
path. Cmdline is read from process memory space directly.

Given the above, my original take on this was that "name" was the 
equivalent of p_comm - the command name only, not the path to the 
executable. Conversely, "path" would represent the path portion only 
of the command that was run ("/usr/bin" etc.) That may be something 
we want to reconsider.

The trouble with this is there is such different behavior on all 
platforms, so I'd like to stick with something unambiguous. We aren't 
always going to be able to provide an absolute path to an executable 
(only reliable on Linux), and we can't always rely on the cmdline[0] 
being the same as the executable run either. In my opinion it doesn't 
make sense to just make "path" == /proc/$PID/exe on Linux, since we 
will not be able to duplicate that behavior on any other platform. I 
recommend we not make changes to the code until we can figure out 
something that will be consistent across all four platforms.

@giampaolo
Copy link
Owner Author

From g.rodola on September 23, 2010 11:05:43

= NAME =

> On UNIX-like systems other than Linux where you have /proc, this is 
> represented by p_comm in the kproc_info structure. This p_comm variable is
>  the "command name" and may be truncated. It does NOT include the path, 
> so /usr/bin/vim becomes just "vim".

This seems fine to me then, and looks exactly the same as how "name" 
is represented on Linux: a serie of names without separators.
An extract of ki_comm(s) (which I suppose to be the as as p_comm on OSX) on FreeBSD:

vmdaemon
pagedaemon
sctp_iterator
swi0: sio
irq12: psm0
irq1: atkbd0
em0 taskq
irq15: ata1
irq14: ata0

I'd be for returning this one as-is without later modifications.

> Since a truncated command name is not very useful, I propose using 
the basename of cmdline[0] 

-1, at least on BSD, Linux and Windows where something is always 
returned from the kernel and it doesn't look truncated AFAICT.

= PATH =

> I think we need to consider this carefully and decide what we want "path" to represent

Just to be clear, my proposal was to get rid of "path" in favor of a 
more useful "exe" property, independently from how we do this.

> In my opinion it doesn't make sense to just make "path" == 
/proc/$PID/exe on Linux, 
> since we will not be able to duplicate that behavior on any other platform.

Currently, we are not able to retrieve the "path" neither.
In fact, at least on OSX and BSD, what we are doing now is 
calculating the path from the cmdline as follows:

        if cmdline and not path:
            # NB: on BSD and OSX this statement is *always* 
            # executed because the underlying C call always 
            # return a path == "".
            path = os.path.dirname(cmdline[0])
            if os.path.isdir(path):
                self.path = path

My proposal for "exe" property, for now, was not to touch the C code 
for now and do something like this:

        if cmdline and not exe:  # at this point exe is always == "" on OSX and BSD
            _exe = os.path.realpath(cmdline[0])
            if hasattr(os, 'access') and hasattr(os, 'X_OK'):
                if os.path.isfile(_exe) and os.access(_exe, os.X_OK):
                    self.exe = _exe

It's not very elegant, but it's the same thing we're currently doing 
with "path", with the difference that exe should be more useful.

@giampaolo
Copy link
Owner Author

From g.rodola on September 23, 2010 11:07:11

Almost forgot: http://www.nightproductions.net/dsprocessesinfo_m.html 
/* This returns the full process name, rather than the 16 char limit
   the p_comm field of the proc struct is limited to.

   Note that this only works if the process is running under the same
   user you are, or you are running this code as root.  If not, then
   the p_comm field is used (this function returns nil).
*/
NSString *nameForProcessWithPID(pid_t pidNum)
{
...

@giampaolo
Copy link
Owner Author

From [email protected] on September 23, 2010 11:47:54

> -1, at least on BSD, Linux and Windows where something is always 
returned from the kernel and it doesn't look truncated AFAICT.

It's a limited size field in both OS X (16 bytes) and FreeBSD (19 
bytes). Linux it probably is also, but we're not using sysctl() so it 
doesn't affect Linux platforms.

I have no problem using p_comm/ki_comm as is, but I still think it 
makes sense to at least check if cmdline[0] matches it first. If so, 
use cmdline[0] instead. There's no risk in doing this, and we won't 
have to worry about truncated command names. If cmdline[0] isn't the 
same, we just fall back on p_comm/ki_comm, whatever it may be. That 
covers both scenarios.

> http://www.nightproductions.net/dsprocessesinfo_m.html This code 
does exactly what our code already does. It uses cmdline[0] or falls 
back on p_comm if the command line is not able to be read.

> It's not very elegant, but it's the same thing we're currently 
doing with "path", with the difference that exe should be more useful.

What's the goal here? To try and make an "exe" property that always 
points to the actual executable file? What would the value of exe be 
if the command has not path, e.g. just "python.exe" - do we just 
return "python.exe" in that case? If this can't be relied on to point 
to the actual executable as an absolute path then it's not very 
consistent/useful either, I think... If that's the case, we might as 
well just let users process cmdline[0] on their own. Tools like 
ps/top dont' really provide a concept of a command path/executable, 
they only use process names... really there's no precedent on how we 
should handle something like a path/exe for each process. 

If we want to provide a consistent "exe" or "path" then maybe we can 
look at some more complex logic, like looking at environment 
variables, or open files to find the executable? At a quick look it 
seems like processes have an environment variable "_" at least on 
Linux and OS X that corresponds to the actual executable. Might be 
something to look into. It'd be more work but I think it might be 
worth investigating to have something more reliable and consistent. 
Windows might be a little tougher but I bet we can figure something out there as well.

@giampaolo
Copy link
Owner Author

From g.rodola on September 23, 2010 13:05:38

> What's the goal here? To try and make an "exe" property that always points to the 
> actual executable file? 

Yes.

> What would the value of exe be if the command has not path, e.g. just "python.exe"

An empty string (which is how path already works).

> really there's no precedent on how we should handle something like 
a path/exe for each process.

Using /proc/pid/exe on Linux and (properly) guessing it from the 
cmdline on all other platforms should be consistent, I think.
What we would do in this case is providing the exe if we're sure it's 
consistent, otherwise we do not provide it at all ("").
Note that also /proc/pid/exe is empty half of the times.
The other alternative is to provide "exe" property on Linux (Windows?) only.

@giampaolo
Copy link
Owner Author

From [email protected] on September 23, 2010 13:30:27

> Using /proc/pid/exe on Linux and (properly) guessing it from the 
cmdline on all other platforms should be  consistent, I think.
> What we would do in this case is providing the exe if we're sure 
it's consistent, otherwise we do not provide it at all ("").
> Note that also /proc/pid/exe is empty half of the times.

Ok, that makes sense, let's just make sure it's documented in the 
docstring/wiki as such. 

> The other alternative is to provide "exe" property on Linux (Windows?) only

I would favor adding this for all platforms and we'll do our best to 
provide it (we'll just have to investigate Windows) or leave it empty 
if we can't retrieve the information. I also suggest we open an 
enhancement request to investigate the other options I suggested - 
maybe at a later date we can utilize something like env variables to 
find the executable file more consistently.

As far as "name" property, I suggest we stick with p_comm/ki_comm and 
comparing against cmdline[0] as I mentioned earlier, just to cover both bases.

@giampaolo
Copy link
Owner Author

From [email protected] on September 23, 2010 13:30:39

Status: Accepted

@giampaolo
Copy link
Owner Author

From g.rodola on September 23, 2010 13:46:13

> As far as "name" property, I suggest we stick with p_comm/ki_comm and comparing 
> against cmdline[0] as I mentioned earlier, just to cover both bases. 

I'm not sure how you plan to do this exactly. Please, attach a patch first.

@giampaolo
Copy link
Owner Author

From [email protected] on September 23, 2010 14:00:10

>> As far as "name" property, I suggest we stick with p_comm/ki_comm and comparing 
>> against cmdline[0] as I mentioned earlier, just to cover both bases. 
>
> I'm not sure how you plan to do this exactly. Please, attach a patch first.

It's very short:

# name is previously set to value of p_comm/ki_comm in C code
if os.path.basename(cmdline[0]).startswith(name):
    name = os.path.basename(cmdline[0])

If the "name" doesn't match cmdline[0] then we just return it to the 
user as is. This avoids any issues with truncated "comm" fields.

@giampaolo
Copy link
Owner Author

From g.rodola on September 23, 2010 14:12:09

Makes sense.

@giampaolo
Copy link
Owner Author

From g.rodola on September 23, 2010 14:32:44

Linux patch committed in r640 and r641 .

Labels: -Progress-0in4 Progress-1in4

@giampaolo
Copy link
Owner Author

From g.rodola on September 24, 2010 14:50:31

Tests now pass also on Windows and BSD.
Jay, you can close this out as long as OSX is ok as well.

Labels: -Progress-1in4 Progress-3in4

@giampaolo
Copy link
Owner Author

From [email protected] on September 25, 2010 08:59:02

test_name is failing on OS X: 

========================================================
FAIL: test_name (__main__.TestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_psutil.py", line 336, in test_name
    self.assertEqual(psutil.Process(sproc.pid).name, 
os.path.basename(PYTHON))
AssertionError: 'Python' != 'python'


Root cause is that sys.executable isn't the same as what sysctl() 
reports for p_comm or cmdline (see below). I think we need to change 
the test since this doesn't seem to be a bug in the library.

>>> os.path.realpath(sys.executable)
'/usr/bin/python'
>>> psutil.Process(os.getpid()).name
'Python'
>>> psutil.Process(os.getpid()).exe
''
>>> psutil.Process(os.getpid()).cmdline
['python']

@giampaolo
Copy link
Owner Author

From [email protected] on October 26, 2010 20:45:27

Note: test_name failure on OS X opened in separate Issue #121

@giampaolo
Copy link
Owner Author

From g.rodola on October 27, 2010 15:03:59

cmdline's basename replacement as suggested by Jay committed in r741 .
Before committing I verified how both names look like. 
Here is an extract from BSD (the first name comes from ki_comm, the 
latter from os.path.basename(cmdline[0]):

sshd
sshd: root@ttyp1

sshd
sshd: root@notty

dhclient
dhclient: em0

dhclient
dhclient: em0 [priv]

login
login [pam]

sendmail
clientmqueue

sendmail
sendmail: accepting connections

@giampaolo
Copy link
Owner Author

From g.rodola on October 27, 2010 15:06:30

Status: FixedInSVN
Labels: -Progress-3in4 OpSys-FreeBSD OpSys-OSX

@giampaolo
Copy link
Owner Author

From g.rodola on October 27, 2010 15:11:12

Please ignore "sendmail -> clientmqueue" example. It was obviously 
coming from an erroneous print since first chars don't match.

@giampaolo
Copy link
Owner Author

From g.rodola on October 27, 2010 15:31:13

It turns out name is truncated on Linux as well, as shown by this script:

for p in psutil.process_iter():
    if not p.cmdline:
        continue
    name = p.name
    ext_name = os.path.basename(p.cmdline[0])
    if ext_name.startswith(name) and (name != ext_name):
        print name
        print ext_name
        print

Fixed in r742 .

@giampaolo
Copy link
Owner Author

From g.rodola on November 12, 2010 19:14:59

Status: Fixed

@giampaolo
Copy link
Owner Author

From g.rodola on March 02, 2013 03:54:47

Updated csets after the SVN -> Mercurial migration: r633 == revision 
258bf4fe8bb7 r638 == revision 84cad53d6ff9 r639 == revision 
22691f448e87 r640 == revision 9db25eb61bba r741 == revision 
0a5626e9afd2 r742 == revision 3192e08e57a0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant