-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACP: CStr::bytes
iterator
#135
Comments
This should probably be Letting you get back a CStr from the Iterator seems reasonable to me. You can imagine someone iterating until they find a path separator and then getting the remaining CStr. I kind of like the idea of using NonZeroU8 but also kind of hate it lol. Non zero types are just so dang hard to work with. Agree that this should be punted. Disagree about the NUL byte stance. I don't know how to resolve the performance implications, but the current API has two methods for returning slices: one with the NUL byte and one without. I don't see a compelling reason for the iterator API to differ. Overall seems like a good idea. |
For what it's worth, I made something a bit similar for wide strings. This were for use in parsing the command line so admittedly it was a single use implementation. I did use |
Yeah, I also don't 100% understand if there's even a difference here. The most important part is making sure the lifetime is there.
Yeah, IMHO we should be providing more APIs that have Not super in favour of mixed APIs (where whether we use it depends on when it was implemented) but I would definitely like to see nonzero versions. Could potentially offer implementations for
To me, including the zero byte in the slice makes a lot of sense because you'd otherwise have to allocate an entirely new array for it, whereas an iterator can accomplish this by just doing |
Or honestly it's probably not that bad to have to write
Did you mean excluding? Otherwise I'm confused by your sentence. And yeah, that's actually a very good point about chaining. That said, I think including the NUL byte could be a common case. For example, say I want to join a I guess there's nothing that would prevent us from adding an iterator with NUL in the future, so I think it makes sense for this ACP to exclude it for now. |
Seconded. I am on board with a However please stick to just the API surface present already on I will separately look at #134 (not necessarily right away) but I think this single iterator type would be all right to add independent of that. |
My main justification for adding the Tracking issue opened & PR updated. |
…iper Add CStr::bytes iterator See rust-lang/libs-team#135 for an ACP. Since rust-lang/libs-team#134 was also accepted, this type is now `core::ffi::c_str::Bytes` instead of `core::ffi::CStrBytes`.
…iper Add CStr::bytes iterator See rust-lang/libs-team#135 for an ACP. Since rust-lang/libs-team#134 was also accepted, this type is now `core::ffi::c_str::Bytes` instead of `core::ffi::CStrBytes`.
…iper Add CStr::bytes iterator See rust-lang/libs-team#135 for an ACP. Since rust-lang/libs-team#134 was also accepted, this type is now `core::ffi::c_str::Bytes` instead of `core::ffi::CStrBytes`.
…iper Add CStr::bytes iterator See rust-lang/libs-team#135 for an ACP. Since rust-lang/libs-team#134 was also accepted, this type is now `core::ffi::c_str::Bytes` instead of `core::ffi::CStrBytes`.
…iper Add CStr::bytes iterator See rust-lang/libs-team#135 for an ACP. Since rust-lang/libs-team#134 was also accepted, this type is now `core::ffi::c_str::Bytes` instead of `core::ffi::CStrBytes`.
…iper Add CStr::bytes iterator See rust-lang/libs-team#135 for an ACP. Since rust-lang/libs-team#134 was also accepted, this type is now `core::ffi::c_str::Bytes` instead of `core::ffi::CStrBytes`.
…iper Add CStr::bytes iterator See rust-lang/libs-team#135 for an ACP. Since rust-lang/libs-team#134 was also accepted, this type is now `core::ffi::c_str::Bytes` instead of `core::ffi::CStrBytes`.
Rollup merge of rust-lang#104353 - clarfonthey:cstr-bytes-iter, r=cuviper Add CStr::bytes iterator See rust-lang/libs-team#135 for an ACP. Since rust-lang/libs-team#134 was also accepted, this type is now `core::ffi::c_str::Bytes` instead of `core::ffi::CStrBytes`.
Add CStr::bytes iterator See rust-lang/libs-team#135 for an ACP. Since rust-lang/libs-team#134 was also accepted, this type is now `core::ffi::c_str::Bytes` instead of `core::ffi::CStrBytes`.
Proposal
Problem statement
Many methods of
CStr
mention that methods liketo_bytes
will eventually require a scan of the string, even though the string is currently represented internally as a slice and has efficient implementations now. However, no existing methods offer a way to access the underlying string in a future-compatible way.Motivation, use-cases
In my particular case, I was trying to create a method that was generic over strings, byte slices, and C strings, and used iterators to achieve this. I noticed that
CString
didn't have any of its own iterator types and wanted to rectify this.Solution sketches
The proposal is to add the following API:
Future extensions
In the future, an
IntoIterator
impl could be added to&CStr
that returns this iterator, although that's left out for now.Iterator trait impls
The proposal explicitly recommends not implementing
DoubleEndedIterator
orExactSizeIterator
since the assumption is that they will not be efficient for the long-term representation of C strings. Additionally, it recommends addingAsRef
and anas_c_str
method to mirror the API offered byslice::Iter
.Item type
The iterator returns
u8
to mirror the items returned bystr::bytes
. Additionally, yielding&u8
might encourage callers to cast this to a pointer rather than using theas_c_str
method to give a reference to the string itself, which has methods dedicated to this purpose.As an alternative,
NonZeroU8
could be used instead, since we know we're not including the nul (see below). However, this would counteract the return value ofto_bytes
, which may be less than desired. There's always room for ato_nonzero_bytes
method, though.Nul byte
The iterator will not include the nul byte at the end of the string. From the perspective of including the bytes inside the string, that nul byte is not actually part of the string, merely past the end of it.
From an implementation perspective, including the nul byte could also be less efficient. Since the iterator will only store a single pointer, it cannot actually progress past the end of the string, since it would have no way of knowing without dereferencing that potentially invalid address. So, there would have to be either additional data inside the iterator indicating that it's "done," or the pointer could be nulled out when the iterator is done. In both cases, this would incur extra checks during iteration in addition to checking the byte itself to see if it's zero.
Additionally, if we were to null out the pointer after the iterator was finished, calling
as_c_str
might have to return a reference to an empty string with an address outside the bounds of the original string. This isn't a super big deal, but it's another way it would differ from theslice::Iter
implementation.Iterator type name
The additional ACP #134 is created in an attempt to resolve this issue.
If that ACP is not accepted, naming the type
Bytes
and exporting it in thecore::ffi
module would be unacceptable, sinceOsStr
could theoretically have abytes
iterator in the future. So, the type is namedCStrBytes
to enforce the fact that it's the bytes of aCStr
, and not anOsStr
.Links and related work
ffi
submodule ACP: ACP:std::ffi::c
andstd::ffi::os
submodules #134What happens now?
This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.
The text was updated successfully, but these errors were encountered: