-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Amend RFC 517: Add material on std::env #578
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -49,7 +49,7 @@ follow-up PRs against this RFC. | |
* [Modules] | ||
* [core::io] (stub) | ||
* [The std::io facade] (stub) | ||
* [std::env] (stub) | ||
* [std::env] | ||
* [std::fs] (stub) | ||
* [std::net] (stub) | ||
* [std::process] (stub) | ||
|
@@ -483,7 +483,72 @@ throughout IO, we can go on to explore the modules in detail. | |
### `std::env` | ||
[std::env]: #stdenv | ||
|
||
> To be added in a follow-up PR. | ||
Most of what's available in `std::os` today will move to `std::env`, | ||
and the signatures will be updated to follow this RFC's | ||
[Design principles] as follows. | ||
|
||
**Arguments**: | ||
|
||
* `args`: change to yield an iterator rather than vector if possible; in any case, it should produce an `OsStrBuf`. | ||
|
||
**Environment variables**: | ||
|
||
* `vars` (renamed from `env`): yields a vector of `(OsStrBuf, OsStrBuf)` pairs. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For symmetry with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems a little unfortunate to convert from a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It does depend on what platform you're running on, however. On Windows there's no extra allocation necessary as the OS hands you an already-allocated block of strings. On Unix we're basically doing the same thing as Windows, except we're the one implementing it (copying the The analogy is pretty much the same for |
||
* `var` (renamed from `getenv`): take a value bounded by `IntoOsStrBuf`, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe that these APIs will need to go into an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These actually all ended up being There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would make more sense for these APIs to return a enum EnvError {
Missing,
NotUnicode(OsString)
} That way you can quickly and ergonomically get at a String, which is usually what you want, but still get at the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, to be more precise, I think it makes sense to have two variants of these APIs, one of which provides an I would propose |
||
allowing Rust strings and slices to be ergonomically passed in. Yields an `Option<OsStrBuf>`. | ||
* `set_var` (renamed from `setenv`): takes two `IntoOsStrBuf`-bounded values. | ||
* `remove_var` (renamed from `unsetenv`): takes a `IntoOsStrBuf`-bounded value. | ||
|
||
* `join_paths`: take an `IntoIterator<T>` where `T: IntoOsStrBuf`, yield a `Result<OsStrBuf, JoinPathsError>`. | ||
* `split_paths` take a `IntoOsStrBuf`, yield an `Iterator<Path>`. | ||
|
||
**Working directory**: | ||
|
||
* `current_dir` (renamed from `getcwd`): yields a `PathBuf`. | ||
* `set_current_dir` (renamed from `change_dir`): takes an `AsPath` value. | ||
|
||
**Important locations**: | ||
|
||
* `home_dir` (renamed from `homedir`): returns home directory as a `PathBuf` | ||
* `temp_dir` (renamed from `tmpdir`): returns a temporary directly as a `PathBuf` | ||
* `current_exe` (renamed from `self_exe_name`): returns the full path | ||
to the current binary as a `PathBuf`. | ||
|
||
**Exit status**: | ||
|
||
* `get_exit_status` and `set_exit_status` stay as they are, but with | ||
updated docs that reflect that these only affect the return value of | ||
`std::rt::start`. | ||
|
||
**Architecture information**: | ||
|
||
* `num_cpus`, `page_size`: stay as they are | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In light of the discussions on the previous RFC, we may want to leave both of these unstable for now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I, and a few others, would prefer if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 for keeping these unstable. It will allow to both polish their behaviour, give more time to decide whether we want these in std at all and, of course, change the name to a more representative one. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tshepang Both of them sound alike to me... After all they fail to convey the important details: the API will return the number of logical CPU cores rather than that of CPUs (for example it will return 8 instead of 1 on Core i7 4790 system.) (I don't oppose to renaming) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @nodakai thought it would be 8 logical CPUs, but 4 physical CPUs. I thought the latter was the case even though all of this is in one chip. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Keep in mind there is also a difference between the number of CPU sockets and the number of CPU cores. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tshepang you cannot hot-plug cores, only chips. Otherwise the line between physical cores and chips is very thin. What’s the meaning of “CPU” is mostly irrelevant to the discussion, though, since we only have one function to rename and it returns the count of online logical cores. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @retep998 by CPU socket, do you mean chip? That is, the 4790 is:
I just want to make sure I understand properly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tshepang Something like that, yes. People often use the term CPU interchangeably for both a CPU chip/socket and a CPU core, so There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @retep998 in software terms, CPUs normally refer to logical CPUs. I think that's the definition most software cares about... It's rare for software to care about actual cores. For those special cases, we'll have to add something like |
||
|
||
**Constants**: | ||
|
||
* Stabilize `ARCH`, `DLL_PREFIX`, `DLL_EXTENSION`, `DLL_SUFFIX`, `EXE_EXTENSION`, `EXE_SUFFIX`, `FAMILY` as they are. | ||
* Rename `SYSNAME` to `OS`. | ||
* Remove `TMPBUF_SZ`. | ||
|
||
This brings the constants into line with our naming conventions elsewhere. | ||
|
||
#### Items to move to `os::platform` | ||
|
||
* `pipe` will move to `os::unix`. It is currently primarily used for | ||
hooking to the IO of a child process, which will now be done behind | ||
a trait object abstraction. | ||
|
||
#### Removed items | ||
|
||
* `errno`, `error_string` and `last_os_error` provide redundant, | ||
platform-specific functionality and will be removed for now. They | ||
may reappear later in `os::unix` and `os::windows` in a modified | ||
form. | ||
* `dll_filename`: deprecated in favor of working directly with the constants. | ||
* `_NSGetArgc`, `_NSGetArgv`: these should never have been public. | ||
* `self_exe_path`: deprecated in favor of `current_exe` plus path operations. | ||
* `make_absolute`: deprecated in favor of explicitly joining with the working directory. | ||
* all `_as_bytes` variants: deprecated in favor of yielding `OsStrBuf` values | ||
|
||
### `std::fs` | ||
[std::fs]: #stdfs | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterators are one of the idiomatic staples of the standard library, and providing them wherever possible tends to increase composability with other APIs. It also has the added bonus of avoiding a
Vec
where one may not be necessary.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you mean avoiding an allocation. That would make sense if the iterator didn't return allocated memory. How often does it happen that someone wants more than the first but not all arguments? Is it not easier to turn a vector into an iterator than an iterator into a vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to do that, can we also make it an iterator over Cow<OsStr, OsStrBuf>, so it's not necessary to allocate & copy on most platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mahkoh
You save an additional allocation by not producing a vector. Yielding an iterator makes it easier to collect the data into an arbitrary container, rather than forcing an allocation of a vector which is then just thrown away to produce an iterator.
@XMPPwocky
Seems worth looking at, though I believe on Windows at least we must always copy (due to encoding differences).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You keep bringing up the current implementation. This discussion is about avoiding allocations. The current implementation does allocate on all platforms. So it is clear that the point of the discussion is to improve the current implementation. Are you trying to say that the current implementation can't be improved this way?
If it that were the goal then there would be no need for Cow. @aturon suggested Cow so that it can be borrowed on the platforms that support it and owned on other platforms.
Then maybe it has to be owned on OSX. The functions used to retrieve argv/c seem to be completely undocumented. Maybe modifying the memory behind it is even UB. If this is what you're worried about then the current implementation is already broken because another thread could modify argc/v while they are being copied.
It seems to be trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, the current implementation is most definitely broken. It retrieves the args anew every times args_as_bytes is called. The args_as_bytes docs say:
If another library modifies the arguments then this is no longer true and the arguments might even change between invocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I'm glad you asked me this, because I think I've been avoiding it based on somewhat stale ergonomic considerations. In previous incarnations (the old
MaybeOwned
) it was not as pleasant to work with, but with my new deref-based design and deref coercions, I think in most cases you won't notice the difference between this and a slice.The only other hesitation was the sense that
Cow
is a bit more complicated than returning an owned type directly, for what is here I think a marginal performance benefit. But I don't really have a strong opinion.So I withdraw what I said above. Assuming that we can safely carry out an implementation (which I'm reasonably confident we can), I'm happy to use
Cow
here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I'll try to be more specific. Today we do not store the value of
argv
passed tomain
, we make a local copy and stick it in astatic
. This is then deallocated whenstd::rt::cleanup
is called. This means that the current implementation most definitely does not have a static lifetime on unix (minus ios/osx).I believe the only lifetime which can be returned for a
CowOsString
is'static
as we otherwise do not have a lifetime to tie to. This means that the current implementation cannot ever yield a borrowed instance (none have the static lifetime). An alternative implementation would be to perhaps store the raw value ofargv
in a global location to be referenced later.The primary reason I am worried about doing this is that I do not know the answer to: "Does this pointer actually have a static lifetime?" Arbitrary code can be running while
main
is running so if the arguments are stored anywhere locally and deallocated before exit this could lead to memory unsafety (via concurrent code in other threads).Some other minor concerns (but not showstoppers) are:
argv
pointers means that it's basically impossible to support theargs::init
API. This is not necessarily a large worry, however.Does that help clear up my concerns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least on linux, argv points to memory that has been allocated by the OS and not the libc. Therefore it won't be deallocated by the libc either and all threads exit before this happens. If you are worried that someone might call
rt::init
with not-static arguments then this is simply an issue of documenting that this causes UB.rt::init
is already unsafe.In C at least it's not unlikely that this happens because allocating and deallocating memory is a PITA. It's unlikely that a library does that, though.