Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
python: use surrogateescape when encoding args in wrapper.py
Problem: When passing commandline arguments (i.e. sys.argv) from Python scripts to C API functions via the wrapper.py class, encoding errors are possible if a utf-8 string is supplied in argv if the current environment does not have a UTF-8 LANG or LC_ALL environemnt set. This seems to be because Python encodes sys.argv to `str` with the current encoding, and (at least as of Python 3.6) there is no way to get the actual bytes given in argv in this environment without throwing a UnicodeEncode error. Fortunately this has been a big enough problem that PEP383[1] was devised, introducing the 'surrogatescape' error handler for encode(), which allows the so-called "smuggling" bytes in character strings. Since the alternative is to possibly throw UnicodeEncode errors, add `errors='surrogateescape'` to the wrapper.py __call__() method, when it encodes string arguments to pass into C. There is likely no downside here, since there is presumably no existing usage depending on the current behavior. [1] https://www.python.org/dev/peps/pep-0383/
- Loading branch information