Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second stabilisation pass of *::char #20395

Merged
merged 7 commits into from
Jan 5, 2015
Merged

Conversation

huonw
Copy link
Member

@huonw huonw commented Jan 1, 2015

cc #19260

The casing transformations are left unstable (it is highly likely to be better to adopt the proper non-1-to-1 case mappings, per #20333) as are is_xid_*.

I've got a little todo list in the last commit of things I thought about/was told about that I haven't yet handled (I'd also like some feedback).

@rust-highfive
Copy link
Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@huonw
Copy link
Member Author

huonw commented Jan 1, 2015

r? @aturon, @alexcrichton, @SimonSapin

@rust-highfive rust-highfive assigned aturon and unassigned nikomatsakis Jan 1, 2015
@SimonSapin
Copy link
Contributor

I’ve added some minor notes, but this looks good to me overall.

@aturon
Copy link
Member

aturon commented Jan 1, 2015

@huonw Finished with initial review, left a few comments.

@huonw
Copy link
Member Author

huonw commented Jan 2, 2015

Quoting here to preserve discussion.


fn is_digit(self, radix: u8) -> bool {
    match self.to_digit(radix) {

Use is_some(), while you're at it?

Good catch.


@SimonSapin:
#[deprecated = "use the char::from_digit free function"]
fn from_digit(num: uint, radix: uint) -> Option;

Why prefer free functions to methods?

@aturon:
std::char::CharExt::from_digit vs std::char::from_digit


 fn len_utf8(self) -> uint;

I think returning uint (or usize when we have it) here is sensible even though way overlarge in most cases, because the length will almost certainly be used as part of a computation in uint space.

I'm not sure what the context of this is? (i.e. as stabilised, len_utf8 is returning uint)


#[stable]
 fn encode_utf8(self, dst: &mut [u8]) -> Option<uint> {

Can we hold off on stabilizing these just yet? It's still a little unclear whether this is the API we ultimately want (versus iterators or readers).

Ok.


-pub fn from_digit(num: uint, radix: uint) -> Option<char> {
+pub fn from_digit(num: u8, radix: u8) -> Option<char> {

So, as the int/uint debate has unfolded, there's been a bit more clarity about tradeoffs on choices like this.

Here you're not gaining much by limiting to a small static size, since you check dynamically and this is unlikely to be perf sensitive. OTOH this introduce an ergonomics problem, because if your computation produces a larger int type you'll have to manually cast.

We expect to introduce implicit widening conversions, which suggest that the most ergonomic solution would actually be u64 (since everything else can widen to it). In practice I think that would not produce any actual overhead. But it would look a bit odd for sure!

Anyway, I'm not really sure what the right answer is here, just wanted to point out some of the drawbacks.

Finally, given that we're likely to eliminate uint this will definitely have to change to something else :-) But we could leave that for the giant int cleanup pass that's going to have to happen during alpha.

@wycats @alexcrichton I'm interested in your thoughts on this relatively simple example re: starting to develop conventions.

I don't have a particular opinion, but it does sounds like u8 may not be appropriate, even though it is the smallest type that fits the bounds.

@@ -8,6 +8,9 @@
// option. This file may not be copied, modified, or distributed
// except according to those terms.

// XXX todolist: len_utf8, len_utf16 (names), encode_utf8,
// encode_utf16 (names, API) are still strange.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note to take another look at this. I've been told that if you s/XXX/TODO/ it'll fail make tidy to make sure you come back to look at it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, whoops. I explicitly chose XXX because I thought make tidy failed on it, but I clearly misremembered.

@alexcrichton
Copy link
Member

I've looked this over and it all looks good to me, thanks @huonw!


@wycats @alexcrichton I'm interested in your thoughts on this relatively simple example re: starting to develop conventions.

I would probably most be in favor of having these be #[unstable] for now pending conventions for what integers are used where. In this specific case I would personally agree with @aturon that the u8 distinction may just cause unnecessary casting. While u8 is the smallest type that's big enough, there are still many invalid u8 values, so we may not be buying much by using a semi-rare type (at least today it's semi-rare)

@aturon
Copy link
Member

aturon commented Jan 3, 2015

@huonw Needs a rebase -- and it'd be good to go ahead and get this wrapped up and landed soon. I agree with @alexcrichton that places with conventions questions around integers can be left #[unstable] for the time being.

@huonw
Copy link
Member Author

huonw commented Jan 3, 2015

Re len_utf8 vs. utf8_len etc., I think

Note that I may be wrong to recommend foo_len vs len_foo due to our usage of iter and iter_mut (prefix of iter to group them together in the docs).

is a good point and reason enough to leave them as len_...; thoughts?

@huonw huonw force-pushed the char-stab-2 branch 2 times, most recently from e7c4404 to 8be3869 Compare January 3, 2015 14:22
@huonw
Copy link
Member Author

huonw commented Jan 3, 2015

Updated.

In particular, I've reverted the uint -> u8 change, and un#[stable]ised those functions, as well as the encode_* functions.

(I believe I was told that stability attributes are useless either on the trait methods or the impl methods, but I can't remember which...)

@aturon
Copy link
Member

aturon commented Jan 4, 2015

@huonw Looks good to me. I don't think we have a terribly clear convention on whether len should appear at the beginning or end. Squash and r=me.

Imports may need to be updated so this is a

[breaking-change]
This "reexports" all the functionality of `core::char::CharExt` as
methods on `unicode::u_char::UnicodeChar` (renamed to `CharExt`).

Imports may need to be updated (one now just imports
`unicode::CharExt`, or `std::char::CharExt` rather than two traits from
either), so this is a

[breaking-change]
bors added a commit that referenced this pull request Jan 5, 2015
cc #19260 

The casing transformations are left unstable (it is highly likely to be better to adopt the proper non-1-to-1 case mappings, per #20333) as are `is_xid_*`.

I've got a little todo list in the last commit of things I thought about/was told about that I haven't yet handled (I'd also like some feedback).
@bors bors merged commit 990a79f into rust-lang:master Jan 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants