-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call PySys_SetArgv when initializing interpreter. #2341
Conversation
e6e1a1d
to
1801811
Compare
include/pybind11/embed.h
Outdated
} | ||
#if PY_MAJOR_VERSION >= 3 | ||
// SetArgv* on python 3 takes wchar_t, so we have to convert. | ||
wchar_t** widened_argv = new wchar_t*[static_cast<unsigned>(argc)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why static_cast<unsigned>
? Is it to avoid sign conversion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. Although I couldn't reproduce it on any of my local compilers, github's clang gave an error there. argc is guaranteed to be positive by the if guard above, so the cast is safe.
include/pybind11/embed.h
Outdated
# if PY_MINOR_VERSION >= 5 | ||
// From Python 3.5 onwards, we're supposed to use Py_DecodeLocale to | ||
// generate the wchar_t version of argv. | ||
widened_argv[ii] = Py_DecodeLocale(safe_argv[ii], nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're trying to avoid sign conversions, ii
should be unsigned as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but that would make the loop condition a signed comparison error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would be cleaner if I did
size_t argc_unsigned = static_cast<unsigned>(argc)
at the top of the function and then just used size_t
s throughout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 ...nope. If I do that I get narrowing conversion errors on the PySys_SetArgv calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the impossible things to get completely right without using some casts. The array indices are size_t
, yet argv
is an int
.
I'm not going to nag about this, but there is some inconsistency in this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved slightly (ii
is now size_t
) but still a little awkward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the initial two comments outside the review.
There's a lot of manual memory management going on here. There are even some non default deletes, like PyMem_RawFree()
. Then there's checking against nullptr
after a new
expression, which is pointless, since new
throws std::bad_alloc
. This should all be fixed with liberal use of std::unique_ptr
.
include/pybind11/embed.h
Outdated
#else | ||
// python 2.x | ||
# if PY_MINOR_VERSION < 6 || (PY_MINOR_VERSION == 6 && PY_MICRO_VERSION < 6) | ||
# define NEED_PYRUN_TO_SANITIZE_PATH 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a macro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. My worry was that if I made it a bool
, a smart optimizer would be able to see that the PyRun_SimpleString
call is unreachable code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really, really don't like using macros for flow control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed how this code is gated in the rewrite; it's still gated by the PY_*_VERSION
macros (because I'm not sure how to not do that), but I avoid the #define
...#undef
, which I agree was hideous.
include/pybind11/embed.h
Outdated
// From Python 3.5 onwards, we're supposed to use Py_DecodeLocale to | ||
// generate the wchar_t version of argv. | ||
widened_argv[ii] = Py_DecodeLocale(safe_argv[ii], nullptr); | ||
# define FREE_WIDENED_ARG(X) PyMem_RawFree(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be PyMem_RawFree()
or PyMem_Free()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyMem_RawFree()
was recommended specifically by the Py_DecodeLocale
docs
I'm not sure what this means, but I appreciate all the comments, especially this one:
The superfluous null checking was just a dumb habit on my part (because everything in the python API needs to be null checked, I was sort-of on autopilot). The manual deletion was unbelievably ugly but it was there because I didn't have a better idea. I've rewritten the function to avoid it. Even though I'm not confident I've done this correctly (having learned just now, from you that there's an array-special-case |
include/pybind11/embed.h
Outdated
}; | ||
std::vector< std::unique_ptr<wchar_t[], pymem_rawfree_deleter> > widened_argv_entries; | ||
# else | ||
std::vector< std::unique_ptr<wchar_t[]> > widened_argv_entries; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bstaletic is there a way to avoid the std::vector
I've added here and still get smart-pointer deletion?
I'm not sure what's going on with the CI... the mac stage changed from "skipped" to "failed", but there aren't any errors shown when I click details. Everything seems fine on my test mac. Pushing a pointless amend to trigger a retry.... |
6c6598a
to
0379418
Compare
0379418
to
3ff967d
Compare
@drmoose Sorry for the late reply. Check this out: https://gist.github.com/bstaletic/ba0afff792396ea438937948b2d80582 It's still not completely clean, but I believe it is a lot more readable. |
Sorry it took me a while to get back to this. I've converted the changes from your gist into commits and pushed to a
These three are unambiguously cleaner than what I had before. Although I had to do a little bit of subsequent tweaking to get something that compiles, this was great.
This one scares me a bit, since it always uses the obsolete approach. Since Python have added
Here, I suspect we just don't agree about whether |
@rwgk It's meant to be compatible with the signature of So, most of the time, the person trying to use pybind embed starts with a |
Sounds good, thanks! Would it still work for your purposes if we added
|
include/pybind11/embed.h
Outdated
char** safe_argv = argv; | ||
std::unique_ptr<char*[]> argv_guard; | ||
std::unique_ptr<char[]> argv_inner_guard; | ||
if (nullptr == argv || argc <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (nullptr == argv || argc <= 0) { | |
if (argv == nullptr || argc <= 0) { |
I prefer the variable first (such as in the second half of this line, in fact).
include/pybind11/embed.h
Outdated
if (nullptr == argv || argc <= 0) { | ||
argv_guard.reset(safe_argv = new char*[1]); | ||
argv_inner_guard.reset(safe_argv[0] = new char[1]); | ||
safe_argv[0][0] = '\0'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this function really modify the char**
it is passed? Here it's modifying a new one, for example. It doesn't seem directly obvious why this is not const
. I found const char* const*
preferable in some cases in CLI11, IIRC.
You can always be more strict; if you have a char**
, you can call a const char* const*
on it. But you can't (without feelling really bad about yourself about casting, anyway) go the other way.
This patch adds const, okay to commit? diff --git a/include/pybind11/embed.h b/include/pybind11/embed.h
index 432185e4..246122c3 100644
--- a/include/pybind11/embed.h
+++ b/include/pybind11/embed.h
@@ -96,7 +96,7 @@ struct wide_char_arg_deleter {
}
};
-inline wchar_t* widen_chars(char* safe_arg) {
+inline wchar_t* widen_chars(const char* safe_arg) {
#if PY_VERSION_HEX >= 0x030500f0
wchar_t* widened_arg = Py_DecodeLocale(safe_arg, nullptr);
#else
@@ -115,18 +115,16 @@ inline wchar_t* widen_chars(char* safe_arg) {
}
/// Python 2.x/3.x-compatible version of `PySys_SetArgv`
-inline void set_interpreter_argv(int argc, char** argv, bool add_program_dir_to_path) {
+inline void set_interpreter_argv(int argc, const char* const* argv, bool add_program_dir_to_path) {
// Before it was special-cased in python 3.8, passing an empty or null argv
// caused a segfault, so we have to reimplement the special case ourselves.
- char** safe_argv = argv;
- std::unique_ptr<char*[]> argv_guard;
- std::unique_ptr<char[]> argv_inner_guard;
- if (nullptr == argv || argc <= 0) {
- argv_guard.reset(safe_argv = new char*[1]);
- argv_inner_guard.reset(safe_argv[0] = new char[1]);
- safe_argv[0][0] = '\0';
+ bool special_case = (nullptr == argv || argc <= 0);
+
+ const char* const empty_argv[] {"\0"};
+ const char* const* safe_argv = special_case ? empty_argv : argv;
+ if(special_case)
argc = 1;
- }
+
#if PY_MAJOR_VERSION >= 3
auto argv_size = static_cast<size_t>(argc);
// SetArgv* on python 3 takes wchar_t, so we have to convert.
@@ -174,7 +172,7 @@ PYBIND11_NAMESPACE_END(detail)
\endrst */
inline void initialize_interpreter(bool init_signal_handlers = true,
int argc = 0,
- char** argv = nullptr,
+ const char* const* argv = nullptr,
bool add_program_dir_to_path = true) {
if (Py_IsInitialized() != 0)
pybind11_fail("The interpreter is already running");
@@ -258,7 +256,7 @@ class scoped_interpreter {
public:
scoped_interpreter(bool init_signal_handlers = true,
int argc = 0,
- char** argv = nullptr,
+ const char* const* argv = nullptr,
bool add_program_dir_to_path = true) {
initialize_interpreter(init_signal_handlers, argc, argv, add_program_dir_to_path);
} |
bump @drmoose, if no objections will commit the const patch. |
I'll commit the patch, and we can roll back if there are objections. But you can always be more strict, and can't be less, so I can't see why this would do anything but help. |
67d8767
to
fc9421e
Compare
fc9421e
to
318495e
Compare
@henryiii Sorry I didn't reply earlier; I wanted to get back to a computer before commenting because I wanted to make sure that there was a way to add |
/// Python 2.x/3.x-compatible version of `PySys_SetArgv` | ||
inline void set_interpreter_argv(int argc, const char* const* argv, bool add_program_dir_to_path) { | ||
// Before it was special-cased in python 3.8, passing an empty or null argv | ||
// caused a segfault, so we have to reimplement the special case ourselves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: shouldn't we add an debug assert that argc >= 0? We are casting a signed int to an unsigned size_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's covered by the argc <= 0
below, combined with argc = 1
.
include/pybind11/embed.h
Outdated
// python 2.x | ||
std::vector<std::string> strings{safe_argv, safe_argv+argv_size}; | ||
std::vector<char*> char_strings{argv_size}; | ||
for (std::size_t i=0; i<argv_size; ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this doesn't look like it's clang-formatted properly @henryiii
The only manual change is an added empty line between pybind11 and system `#include`s. ``` git diff -U0 --no-color master | python3 $HOME/clone/llvm-project/clang/tools/clang-format/clang-format-diff.py -p1 -style=file -i ```
Thanks @drmoose and all reviewers! Merging. |
This PR is to suggest automatically calling
PySys_SetArgvEx
(or an equivalent function) upon initialization of apybind11::scoped_interpreter
.Summary of Changes
initialize_interpreter
andscoped_interpreter
now have three additional arguments:int argc = 0
andchar** argv = nullptr
-- argument count and values, meant to be passed in directly from the calling program'sint main()
.bool add_current_dir_to_path
add_current_dir_to_path
is true (the default, to try to maintain some semblance of backwards compatibility), the directory that gets added to the path will bedirname(argv[0])
rather than just.
. If all arguments are supplied their default values (orargv[0] == ''
) this is equivalent to the current behavior.Rationale
sys.argv
is anAttributeError
unlessPySys_SetArgvEx
gets called. This creates problems for GUI python code (such as matplotlib) since there's code that readssys.argv
in the Debian/Ubuntu version of bothtkinter
and gdk. In both cases, there's no reason the value supplied needs to beint main()
s realargv
, the property just needs to exist.The existing
initialize_interpreter
code adds the current working directory tosys.path
, which is very similar to CVE-2008-5983. cpython itself solved this problem by introducing thePySys_SetArgvEx
function which takes an extra argument that specifies whether to modify the path.There's a significant amount of work involved in getting the arguments in the right format and figuring out exactly which overload to call. It took me a day of reading through the cpython git history to find all of these edge cases, so I'd like to save others the effort. In particular,
PySys_SetArgvEx
itself only exists in python 2.6.6-2.7.x and python >= 3.1.3. For other versions, one must call the oldPySys_SetArgv
and then correct the CVE-2008-5983 behavior by hand.PySys_SetArgvEx
only supports null arguments (special-casing them tosys.argv=[""]
) in python 3.8+PySys_SetArgv
andPySys_SetArgvEx
take awchar_t**
instead of achar**
as their parameter, and the conversion is nontrivialPy_DecodeLocale
) but only in 3.5+mbstowcs(nullptr, ...)
isn't reliable on BSDSuggested changelog entry: