Inconsistent usage of TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR, char, etc #655

mrjefftang · 2015-07-14T21:18:36Z

The Windows section uses all sorts of different string types. It's not clear to me without delving through it thoroughly to see if these types are the correct usage.

This inconsistency is likely to lead to some unicode issues like that seen in #652 . There may be parts of the psutil API for Windows which may not function correctly when given a multibyte character string.

giampaolo · 2015-07-15T11:33:53Z

Yes, there surely are different problems with unicode handling.
I tried to do some research. This is what I determined.

Issues

net_if_stats()

fixed by 064e65e

Test: rename a NIC by using a non-ASCII character ("è") from the control panel.

Python 2 - issue: we should return unicode instead of str.

C:\>C:\python27\python -c "import psutil;  print(psutil.net_if_stats().keys())"
[..., 'Local Area Connection\xe8']

Python 3 - issue: exception.

C:\>C:\python34\python -c "import psutil;  print(psutil.net_if_stats().keys())"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\python34\lib\site-packages\psutil-3.1.0-py3.4-win32.egg\psutil\__init__.py", line 1801, in net_if_stats
    return _psplatform.net_if_stats()
  File "C:\python34\lib\site-packages\psutil-3.1.0-py3.4-win32.egg\psutil\_pswindows.py", line 217, in net_if_stats
    ret = cext.net_if_stats()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 21: unexpected end of data

net_io_counters()

Test: rename a NIC by using a non-ASCII character ("è") from the control panel.

Cmdline: C:\pythonXX\python -c "import psutil; print(psutil.net_io_counters(pernic=True).keys())

Result: everything cool on python 2 and 3.

users()

Test: create a user with a non-ASCII character ("è").

Cmdline: C:\PythonXX\python -c "import psutil; print(psutil.users())"

Result: everything cool on python 2 and 3.

Process.username()

Test: create a user with a non-ASCII character ("è").

Cmdline: C:\python27\python -c "import psutil; print(psutil.Process().username())"

Result: everything cool on python 2 and 3.

Process.cwd()

Test: create a directory with a non-ASCII character ("è").

Python 2 - issue: should return unicode:

C:\Users\fooè>C:\Python27\python -c "import psutil; print(psutil.Process().cwd())"
C:\Users\foo├¿
C:\Users\fooè>C:\Python27\python -c "import psutil; print(repr(psutil.Process().
cwd()))"
'C:\\Users\\foo\xc3\xa8'

Python 3 - everything's cool.

C:\Users\fooè>C:\Python34\python -c "import psutil; print(psutil.Process().
cwd())"
C:\\Users\\fooè

Process.exe()

Test - create an executable with a non-ASCII character.

Cmdline - C:\Python27\python -c "import psutil; print(psutil.Process(3020).exe())" C:\Users\fooè.exe

Result - everything's cool on python 2 and 3.

Process.name()

Fixed in 0847640

~~Same as above, all cool on Python 2.7 and 3.4.~~
~~Fallback proc_name does not return unicode though.~~

Process.cmdline()

Python 2 - should return str instead of unicode.

Python 3 - everything's cool.

Process.open_files()

Python 2 - should return str instead of unicode.

C:\cygwin64\home\user\fooè>C:\python27\python -c "import psutil; f = open('fooè', 'w'); print(psutil.Process().open_files())"
[popenfile(path=u'C:\\cygwin64\\home\\user\\foo\xe8\\foo\xe8', fd=-1)]

Python 3 - everything's cool.

C:\cygwin64\home\user\fooè>C:\python34\python -c "import psutil; f = open('fooè', 'w'); print(psutil.Process().open_files())"
[popenfile(path='C:\\cygwin64\\home\\user\\fooè\\fooè', fd=-1)]

giampaolo · 2015-07-15T11:34:29Z

I am going to create separate tickets for each one of these issues.

mrjefftang · 2015-07-15T13:12:53Z

That sounds like there's 2 issues.

1 - Inconsistent string handling in C (the intial problem).
2 - Inconsistent return values of PyString and PyUnicode.

I think there should be a decision on the API to consistently return the same type for both Python2 and Python3. It'll probably be easiest and simplest to always return PyUnicode though there may be API breaking changes.

My main concern on this ticket was if we are using char string types internally in C (and using ascii API calls instead of unicode API calls) and returning PyUnicode since we would be losing information.

giampaolo · 2015-07-15T14:17:11Z

Yes, I agree that at least on Windows we should always return unicode. In general UNIX systems have more "discipline", meaning it's unlikely we'll bump into an exe, user or network interface with non-ASCII chars, so returning str on Python 2 might be OK. The same cannot be told for Windows.

As for UNIX we can think about returning unicode for Process.cwd() only (but this is a different issue).

2 - Inconsistent return values of PyString and PyUnicode.

Exactly. We should be consistent and try to use PyUnicode everywhere.
Also there's a side issue: there are certain Windows APIs which come with two versions: one supporting unicode, another one not supporting it. Can't remember which ones exactly though.

My main concern on this ticket was if we are using char string types internally in C

If you think the two things should be treated differently feel free to address the TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR issue first, then we'll get to using PyUnicode everywhere.
In the meantime I can write a set of tests similarl to the ones I pasted above and we can use that as a reference of what we should expect and consider the work "done" when all of them "pass".
The hard part will be figuring out how to rename NICs and create users (wmic lib should be able to do that).

mrjefftang · 2015-07-15T14:56:15Z

I think the errors / discrepancies we see are due to a combination of the two issues.

I don't know if they should be treated differently as fixing the error requires fixing both issues.

Once we have a test suite, we can go through each psutil API call and fix them up as appropriate.

giampaolo · 2015-07-15T14:59:21Z

I'm gonna create a branch later today (or tomorrow) with a couple of preliminary tests.
If you want I can give you write access so that we can work on the same branch.
Just do not push anything on master and always work on a branch.

mrjefftang · 2015-07-16T13:02:29Z

Sounds good.

Also I'm not sure if it makes sense for the behavior on Windows and Linux to be different.

Having an API which returns different types based on the OS seems wrong, especially if you are using psutil for a cross platform application.

At the same time it sucks for one OS to pay the overhead due to another OS misbehaving.

I think there just needs to be a decision made to always return PyString or PyUnicode. In my mind, it makes more sense to always return PyUnicode since it's the way forward in Python3. It should make things cleaner internally within psutil without all of the #ifdef checking between Python2 and Python3.

mrjefftang · 2015-07-31T20:48:05Z

Some more insight into the ascii/unicode battle from the Python side:
http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#why-was-python-3-made-incompatible-with-python-2

ddaeschler · 2015-08-01T03:54:56Z

If you have determined a way forward on this I'd love to help patch it up.

…g strings (except username)

giampaolo · 2015-08-01T10:33:04Z

Sorry for the delay. It's been a crazy week.
So, I created a separate branch:
https://github.com/giampaolo/psutil/compare/655-windows-unicode
...and added some preliminary unit tests for all process methods returning strings except for username():
21da6e7
As expected, they break on Python 2. On Python 3 all is good except for process name().
I would say let's start working on these first (name, exe, cmdline, cwd, open_files) and have them all return unicode on python 2.
It may be possible that despite tests are correct on python 3 they may need some adjustments on python 2, but that can be figured out after psutil returns the correct type and data.

@mrjefftang I gave you write access.

giampaolo · 2015-08-01T12:13:05Z

In Python 3 this is the only thing (process related) which is plain broken and should be fixed independently from the str/unicode unification:

======================================================================
ERROR: test_proc_name (__main__.TestUnicode)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test\_windows.py", line 483, in test_proc_name
    self.assertEqual(psutil._psplatform.cext.proc_name(subp.pid),
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 7: invalid continuation byte
----------------------------------------------------------------------

mrjefftang · 2015-08-24T15:16:41Z

Havent tested it yet but #670 should take care of the net_* unicode issues identified here.

giampaolo · 2015-08-26T10:08:41Z

I fixed net_if_stats as of 064e65e

giampaolo · 2015-08-26T13:04:47Z

Process.name fallback method now returns unicode instead of str as of 0847640

giampaolo · 2015-08-26T13:29:08Z

Thinking back about the approach we decided to use here I take back the proposal of returning unicode all the time on Python 2. It's a too disruptive change and may break a lot of code out there. I think a better approach is to return str in case of ASCII strings, else unicode.

giampaolo · 2015-08-30T12:05:02Z

I changed my mind once again: unexpectedly returning a different type (str or unicode) depending on the string we're dealing with is not a consistent API and may lead to unpleasant user code like if isinstance(s, 'unicode'): .... Also, that would affect Windows users only, which is even worse.
I decided that on Python 2 we will always return a string (str type) encoded by using sys.getfilesystemencoding() codec and I will make this clear in the doc.
It must be noted that this is the behavior we're already using on Linux / Python 2 and other platforms as well (since we're not returning unicode anywhere) so what I'm I'm trying to do here is to be consistent with the established behavior in a cross-platform fashion.
It must be noted that this issue does not exist on Python 3 since Python 3 fixed unicode handling once and for all.
The relevant PR which I've just merged is: #676

@mrjefftang : I realize I've improperly been using this issue as the original subject was Inconsistent usage of TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR. Feel free to open a new one in which we will address that problem.

…ken on Python 2 so disable tests which check for exact path match

giampaolo added a commit that referenced this issue Aug 1, 2015

#655: provide preliminary unicode tests for all process APIs returnin…

21da6e7

…g strings (except username)

giampaolo added a commit that referenced this issue Aug 26, 2015

#655: fix net_if_stats unicode error

064e65e

giampaolo added a commit that referenced this issue Aug 26, 2015

#655: have Process.name() fallback method return unicode instead of str

0847640

giampaolo closed this as completed Aug 30, 2015

giampaolo added a commit that referenced this issue Apr 28, 2017

#655 / unicode tests: assume APIs handling with unicode paths are bro…

52ed10c

…ken on Python 2 so disable tests which check for exact path match

giampaolo added the windows label Nov 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent usage of TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR, char, etc #655

Inconsistent usage of TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR, char, etc #655

mrjefftang commented Jul 14, 2015

giampaolo commented Jul 15, 2015

giampaolo commented Jul 15, 2015

mrjefftang commented Jul 15, 2015

giampaolo commented Jul 15, 2015

mrjefftang commented Jul 15, 2015

giampaolo commented Jul 15, 2015

mrjefftang commented Jul 16, 2015

mrjefftang commented Jul 31, 2015

ddaeschler commented Aug 1, 2015

giampaolo commented Aug 1, 2015

giampaolo commented Aug 1, 2015

mrjefftang commented Aug 24, 2015

giampaolo commented Aug 26, 2015

giampaolo commented Aug 26, 2015

giampaolo commented Aug 26, 2015

giampaolo commented Aug 30, 2015

Inconsistent usage of TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR, char, etc #655

Inconsistent usage of TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR, char, etc #655

Comments

mrjefftang commented Jul 14, 2015

giampaolo commented Jul 15, 2015

Issues

net_if_stats()

net_io_counters()

users()

Process.username()

Process.cwd()

Process.exe()

Process.name()

Process.cmdline()

Process.open_files()

giampaolo commented Jul 15, 2015

mrjefftang commented Jul 15, 2015

giampaolo commented Jul 15, 2015

mrjefftang commented Jul 15, 2015

giampaolo commented Jul 15, 2015

mrjefftang commented Jul 16, 2015

mrjefftang commented Jul 31, 2015

ddaeschler commented Aug 1, 2015

giampaolo commented Aug 1, 2015

giampaolo commented Aug 1, 2015

mrjefftang commented Aug 24, 2015

giampaolo commented Aug 26, 2015

giampaolo commented Aug 26, 2015

giampaolo commented Aug 26, 2015

giampaolo commented Aug 30, 2015