Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] bpo-39413: os.unsetenv() uses _wputenv() on Windows #18115

Closed
wants to merge 1 commit into from
Closed

[WIP] bpo-39413: os.unsetenv() uses _wputenv() on Windows #18115

wants to merge 1 commit into from

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Jan 22, 2020

os.unsetenv() uses _wputenv() rather than SetEnvironmentVariableW():
_wputenv() updates the CRT, whereas SetEnvironmentVariableW() does
not.

Replace also lambda functions with regular functions to get named
functions, to ease debug.

https://bugs.python.org/issue39413

@vstinner
Copy link
Member Author

This PR is a follow-up of PR #18107 which was a implementation in pure Python. @serhiy-storchaka wrote that he would prefer an implementation in C: so here you have.

@@ -10126,6 +10126,8 @@ os_putenv_impl(PyObject *module, PyObject *name, PyObject *value)
goto error;
}

/* Use _wputenv() rather than SetEnvironmentVariableW(): _wputenv()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The search for "=" (line 10102) should begin at index 0, not index 1. Windows allows variable names that begin with 0, but ucrt does not allow it. For example:

>>> os.putenv('spam=', 'eggs')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: illegal environment variable name

>>> os.putenv('=spam', 'eggs')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this surprising difference and I'm trying to understand what should be the correct behavior...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work with '=C'?

Copy link
Contributor

@eryksun eryksun Jan 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python initializes nt.environ from C _wenviron, so it follows the long-standing convention of MSVC to exclude hidden variables that begin with "=" and to disallow setting them. We still set them for per-drive working directories (e.g. "=Z:"), but we set them in the process environment only, with SetEnvironmentVariableW. For example, see win32_wchdir in this module. The drive variables are supposed to be a hidden implementation detail. CMD's set doesn't show them, except due to a parsing bug if we run set "" or set ".

Copy link
Contributor

@eryksun eryksun Jan 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to allow setting them, then Python should initialize nt.environ from WINAPI GetEnvironmentStringsW instead of _wenviron (or at least also from GetEnvironmentStringsW , just for the hidden variables). Then if a name begins with "=", call SetEnvironmentVariableW instead of _wputenv.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about os.unsetenv("key=value")?

By the way, why does os.putenv() explicitly rejects "key=" variable name?

Because in "key==value" what is a key and what is a value? key=="key" and value=="=value" or key=="key=" and value=="value"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows is so weird. The official documentation says:

The name of an environment variable cannot include an equal sign (=).

I tested: I can set a variable with name "zlong=" (the name contains "="). It is shown by the set command for example:

>>> import os
>>> os.putenv("zlong=", "1")
>>> os.putenv("zlon=g", "2")
>>> os.system("set")
(...)
zlon=g=2
zlong==1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The posix_putenv_dict_setitem call below is unnecessary with _wputenv. It copies the string before adding it to _wenviron, i.e. it does not comply with POSIX putenv.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked back to even Python 2.7 w/ msvcr90.dll, and even that old version of the CRT copies the string that's passed to putenv (via calls to _calloc_crt and strcpy_s), which I verified by noting the result from _calloc_crt and cross-checking with the address returned by getenv. The CRT didn't used to copy the string back in the 1990s, but that's ancient history.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The posix_putenv_dict_setitem call below is unnecessary with _wputenv. It copies the string before adding it to _wenviron, i.e. it does not comply with POSIX putenv.

Fixed in commit 0852c7d.

os.unsetenv() uses _wputenv() rather than SetEnvironmentVariableW():
_wputenv() updates the CRT, whereas SetEnvironmentVariableW() does
not.

Replace also lambda functions with regular functions to get named
functions, to ease debug.
os_putenv_impl(PyObject *module, PyObject *name, PyObject *value)
/*[clinic end generated code: output=d29a567d6b2327d2 input=ba586581c2e6105f]*/
static PyObject*
py_win_putenv(PyObject *name, PyObject *value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is py_win_ the naming convention from now on -- as opposed to win32_?

/* Search from index 1 because on Windows starting '=' is allowed for
defining hidden environment variables. */
if (PyUnicode_GET_LENGTH(name) == 0 ||
PyUnicode_FindChar(name, '=', 1, PyUnicode_GET_LENGTH(name), 1) != -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this search should start at index 0, so that it consistently raises ValueError, instead of OSError if a name begins with "=".

@vstinner
Copy link
Member Author

Some notes on Windows environment variables:

  • There are two APIs:

    • native API: GetEnvironmentStringsW(), SetEnvironmentVariableW(), GetEnvironmentVariableW()
    • CRT API: _wenviron, _wgetenv(),_wputenv()
    • (I ignore the bytes API here on purpose, it's already too complicated :-))
  • There was a limit 32,767 characters (_MAX_ENV constant) for "key=value" string until Windows 2003: on more recent Windows, there is no technical limit: SetEnvironmentVariableW() doesn't fail for example. But "a batch file cannot set a variable that is longer than the maximum command line length."

  • The CRT API updates the CRT API and the native API, whereas native API doesn't update the CRT API.

  • "=" in name:

    • SetEnvironmentVariableW() only accepts "=" if its is a the name start: "=c:" and "=test" are accepted, "test=" or "te=st" are rejected
    • _wputenv() accepts "key=" and "te=st", but rejects "=c:" and "=test": "=" is not rejected if it's at the start of the name
  • empty name: SetEnvironmentVariableW() and _wputenv("", "") fails with Windows error 87 (invalid argument)

  • set shell command doesn't display variable names with a name starting with "=", but names with "=" in other places. My bet is that set uses the CRT API.

  • Python and variable names containing "=": Python is wrong. It parses the variable "victor=" with value "secret" as name="victor" and value="=secret" :-( convertenviron() of posixmodule.c searchs the first equal character, not the last.

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Jan 22, 2020

There are also "secure" (or "safe"?) (but not thread-safe) _wgetenv_s() anf _wputenv_s(). _wputenv_s() is more convenient than _wputenv() because you do not need to concatenate key and value for it.

@vstinner
Copy link
Member Author

vstinner commented Jan 22, 2020

Python and variable names containing "=": Python is wrong. It parses the variable "victor=" with value "secret" as name="victor" and value="=secret" :-( convertenviron() of posixmodule.c searchs the first equal character, not the last.

I created https://bugs.python.org/issue39420 to track this issue.

@vstinner
Copy link
Member Author

I'm thinking aloud.

If we switch Python to use the native API (SetEnvironmentVariableW()) instead of the CRT API (_wputenv()), C libraries run inside the Python process and using the CRT API (getenv() or _wgetenv()) would no longer see "updated" environment variables. For example, I guess that OpenSSL can read environment variable after Python start. It can be an issue if it happens. The safe option is to stick to the CRT API.

To support "=" at the beginning of the variable name, one option would be to use the native API (SetEnvironmentVariableW()), or the CRT API (_wputenv()) otherwise.

I tested _wgetenv(). It has a weird behavior :-) If I set a variable with _wputenv("victor=", "secret"), _wgetenv("victor") returns =secret and _wgetenv("victor=") returns secret!

_wgetenv("=c:") fails with "OSError: [WinError 6] Descripteur non valide".

@eryksun
Copy link
Contributor

eryksun commented Jan 22, 2020

There was a limit 32,767 characters (_MAX_ENV constant) for "key=value" string until Windows 2003: on more recent Windows, there is no technical limit: SetEnvironmentVariableW() doesn't fail for example.

The consequence for this is that in Vista and later GetEnvironmentStringsW now acquires the PEB lock to snapshot a copy of the current environment block. Otherwise the result would be worthless and dangerous, since the environment block can be reallocated to grow without limit.

_wputenv() accepts "key=" and "te=st", but rejects "=c:" and "=test": "=" is not rejected if it's at the start of the name

"=" is rejected by _wputenv if it's at the start of the name. It's allowed with "te=st" because Windows allows "=" in the value, e.g. the latter translates to SetEnvironmentVariableW(L"te", L"=st").

set shell command doesn't display variable names with a name starting with "=", but names with "=" in other places. My bet is that set uses the CRT API.

CMD actually implements this all on its own. And it has a parsing bug that leads to displaying the 'hidden' variables if you run set "" and set ".

Python and variable names containing "=": Python is wrong. It parses the variable "victor=" with value "secret" as name="victor" and value="=secret" :-( convertenviron() of posixmodule.c searchs the first equal character, not the last.

Python is right.

@vstinner
Copy link
Member Author

CMD actually implements this all on its own. And it has a parsing bug that leads to displaying the 'hidden' variables if you run set "" and set ".

Oh right, nice :-)

vstinner@WIN C:\vstinner\python\master>set ""
=C:=C:\vstinner\python\master
=ExitCode=00000001
(...)

@eryksun
Copy link
Contributor

eryksun commented Jan 22, 2020

_wputenv("victor=", "secret"), _wgetenv("victor") returns =secret and _wgetenv("victor=") returns secret!

Do you mean _wputenv(L"victor==secret")? The call only has one argument. In that case, yes, _wgetenv has a bug here. It will naively match "=" as part of the name (e.g. "victor=") instead of splitting the (name, value) on the first "=" and only matching the name, or better yet, it could simply fail the call if the given name contains "=".

@serhiy-storchaka
Copy link
Member

_wputenv() takes a single argument.

int _wputenv(
   const wchar_t *envstring
);

https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/putenv-wputenv?view=vs-2019#syntax

@vstinner
Copy link
Member Author

GetEnvironmentStringsA() and GetEnvironmentStringsW() result also contain variables with name starting with "=", like =C or =ExitCode variables. But I misunderstood the doc... which looks wrong as well... A variable name and value are separated by "=", not by a null character!

It seems like internally, Windows environment is a list of "name=value\0" strings. Variable name and value seem to be stored together as a single string.

Moreover, after _wputenv("victor=secret"):

  • _wgetenv("victor") and GetEnvironmentVariable("victor") return "=secret"
  • _wgetenv("victor=") and GetEnvironmentVariable("victor=") returns "secret"

... in short, it doesn't seem possible to distinguish if "=" character belongs to the name or the value, except for the special case of name starting with "=".

SetEnvironmentVariableW() allows a name starting with "=", but "=" is not allowed elsewhere in the same. SetEnvironmentVariableW("=a=b", "c") fails for example (WinError 87).

The problem comes from _wputenv() API which takes a single argument.

@vstinner
Copy link
Member Author

I failed to find CRT source code in my VS 2017 Community install. VC\crt\src directory is missing.

But I found ReactOS implementation which gives me an idea on how it can be implemented and are the constraints: https://doxygen.reactos.org/df/def/putenv_8c.html#adea7d1267d47562084ac102de11a11fb

  • CRT provides char **environ and char **_wenviron which are separated from the Windows envrionement. In short, technically there 3 environment! The CRT API tries to maintain the 3 consistent.
  • CRT copies the string: Python doesn't have to manage the variable memory!

@eryksun
Copy link
Contributor

eryksun commented Jan 22, 2020

It seems like internally, Windows environment is a list of "name=value\0" strings. Variable name and value seem to be stored together as a single string.

A valid environment block is also terminated by a final null character.

Moreover, after _wputenv("victor=secret"):

* _wgetenv("victor") and GetEnvironmentVariable("victor") return "=secret"

These are correct.

* _wgetenv("victor=") and GetEnvironmentVariable("victor=") returns "secret"

The _wgetenv result is a bug that I already confirmed that in a previous comment.

The GetEnvironmentVariable result is likely a similar error due to naive name matching. I think it was introduced circa Vista, and the old code from 2003 and earlier did it right. I'll have to check the ReactOS implementation of RtlQueryEnvironmentVariable_U (replaced in Vista by RtlQueryEnvironmentVariable) to see how ReactOS reverse-engineered behavior that's supposed to be compatible with Server 2003. If their code properly verifies the name match (see below), then it's a bug from Vista. They rewrote much of the environment code in Vista. It wouldn't surprise me if they made naive assumptions.

The problem comes from _wputenv() API which takes a single argument.

There is no problem with _wputenv and SetEnvironmentVariableW allowing values that contain "=". When parsing an environment block, if a name-value pair starts with "=", then "=" must be the first character of the name, because all variables must have a name. The first "=" other than the first character must be the value delimiter because all variables must have a value. Any "=" character after that must be part of the value.

@eryksun
Copy link
Contributor

eryksun commented Jan 22, 2020

I failed to find CRT source code in my VS 2017 Community install. VC\crt\src directory is missing.

Look in "%ProgramFiles(x86)%\Windows Kits\10\Source\10.0.[version]\ucrt\env".

@eryksun
Copy link
Contributor

eryksun commented Jan 22, 2020

I'll have to check the ReactOS implementation of RtlQueryEnvironmentVariable_U (replaced in Vista by RtlQueryEnvironmentVariable) to see how ReactOS reverse-engineered behavior that's supposed to be compatible with Server 2003.

The ReactOS implementation of RtlQueryEnvironmentVariable_U looks correct to me, so I think the bug is in Vista's new RtlQueryEnvironmentVariable function.

In the loop body, they first increment wcs to skip the first character of the name, and then search for "=" via wcschr(wcs, L'='). This is the delimiter, and everything after it up to the null is val. They slice out the name, not including the "=", into var and compare this to the target Name via RtlEqualUnicodeString. If it matches, they copy val to the target Value string, if it's big enough. This could be optimized to skip comparing names that are the wrong size, but it otherwise seems okay.

@vstinner
Copy link
Member Author

Handling environment variables on Windows is more complex than what I expected. I start by reverting my previous change, to move back to the previous status quo: PR #18124.

I would prefer to restart from the previous state, and try to handle all issues at once.

@vstinner vstinner changed the title bpo-39413: os.unsetenv() uses _wputenv() on Windows [WIP] bpo-39413: os.unsetenv() uses _wputenv() on Windows Jan 22, 2020
@vstinner
Copy link
Member Author

This PR has some issues. I wrote PR #18163 which is simpler and so should be easier to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants