Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amend RFC 517: Add material on std::env #578

Merged
merged 1 commit into from
Jan 30, 2015
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 67 additions & 2 deletions text/0517-io-os-reform.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ follow-up PRs against this RFC.
* [Modules]
* [core::io] (stub)
* [The std::io facade] (stub)
* [std::env] (stub)
* [std::env]
* [std::fs] (stub)
* [std::net] (stub)
* [std::process] (stub)
Expand Down Expand Up @@ -483,7 +483,72 @@ throughout IO, we can go on to explore the modules in detail.
### `std::env`
[std::env]: #stdenv

> To be added in a follow-up PR.
Most of what's available in `std::os` today will move to `std::env`,
and the signatures will be updated to follow this RFC's
[Design principles] as follows.

**Arguments**:

* `args`: change to yield an iterator rather than vector if possible; in any case, it should produce an `OsStrBuf`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to yield an iterator rather than vector if possible

For what purpose?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterators are one of the idiomatic staples of the standard library, and providing them wherever possible tends to increase composability with other APIs. It also has the added bonus of avoiding a Vec where one may not be necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean avoiding an allocation. That would make sense if the iterator didn't return allocated memory. How often does it happen that someone wants more than the first but not all arguments? Is it not easier to turn a vector into an iterator than an iterator into a vector?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to do that, can we also make it an iterator over Cow<OsStr, OsStrBuf>, so it's not necessary to allocate & copy on most platforms?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mahkoh

I assume you mean avoiding an allocation. That would make sense if the iterator didn't return allocated memory. How often does it happen that someone wants more than the first but not all arguments? Is it not easier to turn a vector into an iterator than an iterator into a vector?

You save an additional allocation by not producing a vector. Yielding an iterator makes it easier to collect the data into an arbitrary container, rather than forcing an allocation of a vector which is then just thrown away to produce an iterator.

@XMPPwocky

If we're going to do that, can we also make it an iterator over Cow<OsStr, OsStrBuf>, so it's not necessary to allocate & copy on most platforms?

Seems worth looking at, though I believe on Windows at least we must always copy (due to encoding differences).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned above the implementation does not actually have a static lifetime (to returned a borrowed Cow) on all platforms.

You keep bringing up the current implementation. This discussion is about avoiding allocations. The current implementation does allocate on all platforms. So it is clear that the point of the discussion is to improve the current implementation. Are you trying to say that the current implementation can't be improved this way?

(to returned a borrowed Cow)

If it that were the goal then there would be no need for Cow. @aturon suggested Cow so that it can be borrowed on the platforms that support it and owned on other platforms.

Additionally, on platforms like OSX there is no safeguard against other libraries in the process perhaps modifying the arguments.

Then maybe it has to be owned on OSX. The functions used to retrieve argv/c seem to be completely undocumented. Maybe modifying the memory behind it is even UB. If this is what you're worried about then the current implementation is already broken because another thread could modify argc/v while they are being copied.

but it's somewhat questionable whether the work will be worth it in the long run.

It seems to be trivial.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, on platforms like OSX there is no safeguard against other libraries in the process perhaps modifying the arguments.

In fact, the current implementation is most definitely broken. It retrieves the args anew every times args_as_bytes is called. The args_as_bytes docs say:

/// Returns the arguments which this program was started with (normally passed
/// via the command line) as byte vectors.

If another library modifies the arguments then this is no longer true and the arguments might even change between invocations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the negative consequences of using Cow?

Hm, I'm glad you asked me this, because I think I've been avoiding it based on somewhat stale ergonomic considerations. In previous incarnations (the old MaybeOwned) it was not as pleasant to work with, but with my new deref-based design and deref coercions, I think in most cases you won't notice the difference between this and a slice.

The only other hesitation was the sense that Cow is a bit more complicated than returning an owned type directly, for what is here I think a marginal performance benefit. But I don't really have a strong opinion.

So I withdraw what I said above. Assuming that we can safely carry out an implementation (which I'm reasonably confident we can), I'm happy to use Cow here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you trying to say that the current implementation can't be improved this way?

Ah yes, I'll try to be more specific. Today we do not store the value of argv passed to main, we make a local copy and stick it in a static. This is then deallocated when std::rt::cleanup is called. This means that the current implementation most definitely does not have a static lifetime on unix (minus ios/osx).

I believe the only lifetime which can be returned for a CowOsString is 'static as we otherwise do not have a lifetime to tie to. This means that the current implementation cannot ever yield a borrowed instance (none have the static lifetime). An alternative implementation would be to perhaps store the raw value of argv in a global location to be referenced later.

The primary reason I am worried about doing this is that I do not know the answer to: "Does this pointer actually have a static lifetime?" Arbitrary code can be running while main is running so if the arguments are stored anywhere locally and deallocated before exit this could lead to memory unsafety (via concurrent code in other threads).

Some other minor concerns (but not showstoppers) are:

  • Using a raw argv pointers means that it's basically impossible to support the args::init API. This is not necessarily a large worry, however.
  • I have been worried about passing pointers into this memory as we don't necessarily have any guarantees that no one else is modifying it. You bring up a good point, however, that this also applicable to the implementation which copies out of the pointers. The upside of not passing the raw pointer out is that there is a deterministic scope in which you know that the arguments are accessed from the system, but I do not know if this would actually be relevant at all in practice. I doubt that anyone modifies arguments anyway.

Does that help clear up my concerns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arbitrary code can be running while main is running so if the arguments are stored anywhere locally and deallocated before exit this could lead to memory unsafety (via concurrent code in other threads).

At least on linux, argv points to memory that has been allocated by the OS and not the libc. Therefore it won't be deallocated by the libc either and all threads exit before this happens. If you are worried that someone might call rt::init with not-static arguments then this is simply an issue of documenting that this causes UB. rt::init is already unsafe.

I doubt that anyone modifies arguments anyway.

In C at least it's not unlikely that this happens because allocating and deallocating memory is a PITA. It's unlikely that a library does that, though.


**Environment variables**:

* `vars` (renamed from `env`): yields a vector of `(OsStrBuf, OsStrBuf)` pairs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For symmetry with args, perhaps this should return an iterator? I think that we're required to always make a snapshot, but API-wise it may make more sense to keep in line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a little unfortunate to convert from a Vec to an Iterator when we started with a Vec: if someone actually needs a Vec, there'll be two allocations, the original one and the collectd one, while it's easy to go from a Vec to an Iterator. (This applies to the args case somewhat too, since many platforms start with a Vec forcing two allocations there.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does depend on what platform you're running on, however. On Windows there's no extra allocation necessary as the OS hands you an already-allocated block of strings. On Unix we're basically doing the same thing as Windows, except we're the one implementing it (copying the environ into a locally owned allocation). So for something like env::vars().collect::<Vec<_>>() you would save an allocation on Unix if you instead returned a Vec but you would force an unnecessary extra one on Windows if you didn't need to collect to a vector.

The analogy is pretty much the same for args as well I believe. Windows will hand us an already-allocated block that we locally own but we're forced to create one on Unix ourselves. All that being said though, I do also think that consistency and API design is probably more important here than runtime performance as these probably aren't the source of slowdown for too many programs :)

* `var` (renamed from `getenv`): take a value bounded by `IntoOsStrBuf`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IntoOsStrBuf or IntoOsStr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that these APIs will need to go into an OsStrBuf regardless as they need to be passed to a system API which means they have to be nul-terminated (and a slice won't have the terminator).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These actually all ended up being AsStrBuf anyway, so I think the RFC just needs to be updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make more sense for these APIs to return a Result<String, EnvError> where EnvError is:

enum EnvError {
  Missing,
  NotUnicode(OsString)
}

That way you can quickly and ergonomically get at a String, which is usually what you want, but still get at the OsString if you really need it (usually to present a good error message).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, to be more precise, I think it makes sense to have two variants of these APIs, one of which provides an Option<OsString> and one of which provides a Result.

I would propose var for the version that returns a Result<String, E> and raw_var for the one that returns an Option<OsString>

allowing Rust strings and slices to be ergonomically passed in. Yields an `Option<OsStrBuf>`.
* `set_var` (renamed from `setenv`): takes two `IntoOsStrBuf`-bounded values.
* `remove_var` (renamed from `unsetenv`): takes a `IntoOsStrBuf`-bounded value.

* `join_paths`: take an `IntoIterator<T>` where `T: IntoOsStrBuf`, yield a `Result<OsStrBuf, JoinPathsError>`.
* `split_paths` take a `IntoOsStrBuf`, yield an `Iterator<Path>`.

**Working directory**:

* `current_dir` (renamed from `getcwd`): yields a `PathBuf`.
* `set_current_dir` (renamed from `change_dir`): takes an `AsPath` value.

**Important locations**:

* `home_dir` (renamed from `homedir`): returns home directory as a `PathBuf`
* `temp_dir` (renamed from `tmpdir`): returns a temporary directly as a `PathBuf`
* `current_exe` (renamed from `self_exe_name`): returns the full path
to the current binary as a `PathBuf`.

**Exit status**:

* `get_exit_status` and `set_exit_status` stay as they are, but with
updated docs that reflect that these only affect the return value of
`std::rt::start`.

**Architecture information**:

* `num_cpus`, `page_size`: stay as they are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In light of the discussions on the previous RFC, we may want to leave both of these unstable for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I, and a few others, would prefer if num_cpus was renamed to cpu_count.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for keeping these unstable. It will allow to both polish their behaviour, give more time to decide whether we want these in std at all and, of course, change the name to a more representative one.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tshepang Both of them sound alike to me... After all they fail to convey the important details: the API will return the number of logical CPU cores rather than that of CPUs (for example it will return 8 instead of 1 on Core i7 4790 system.) (I don't oppose to renaming)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nodakai thought it would be 8 logical CPUs, but 4 physical CPUs. I thought the latter was the case even though all of this is in one chip.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind there is also a difference between the number of CPU sockets and the number of CPU cores.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tshepang you cannot hot-plug cores, only chips. Otherwise the line between physical cores and chips is very thin. What’s the meaning of “CPU” is mostly irrelevant to the discussion, though, since we only have one function to rename and it returns the count of online logical cores.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@retep998 by CPU socket, do you mean chip? That is, the 4790 is:

  • 1 chip
  • 4 physical CPUs (aka cores)
  • 8 logical CPUs

I just want to make sure I understand properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tshepang Something like that, yes. People often use the term CPU interchangeably for both a CPU chip/socket and a CPU core, so cpu_count is a confusing name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@retep998 in software terms, CPUs normally refer to logical CPUs. I think that's the definition most software cares about... It's rare for software to care about actual cores. For those special cases, we'll have to add something like physical_cpu_count or core_count or...


**Constants**:

* Stabilize `ARCH`, `DLL_PREFIX`, `DLL_EXTENSION`, `DLL_SUFFIX`, `EXE_EXTENSION`, `EXE_SUFFIX`, `FAMILY` as they are.
* Rename `SYSNAME` to `OS`.
* Remove `TMPBUF_SZ`.

This brings the constants into line with our naming conventions elsewhere.

#### Items to move to `os::platform`

* `pipe` will move to `os::unix`. It is currently primarily used for
hooking to the IO of a child process, which will now be done behind
a trait object abstraction.

#### Removed items

* `errno`, `error_string` and `last_os_error` provide redundant,
platform-specific functionality and will be removed for now. They
may reappear later in `os::unix` and `os::windows` in a modified
form.
* `dll_filename`: deprecated in favor of working directly with the constants.
* `_NSGetArgc`, `_NSGetArgv`: these should never have been public.
* `self_exe_path`: deprecated in favor of `current_exe` plus path operations.
* `make_absolute`: deprecated in favor of explicitly joining with the working directory.
* all `_as_bytes` variants: deprecated in favor of yielding `OsStrBuf` values

### `std::fs`
[std::fs]: #stdfs
Expand Down