Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Throw UnicodeError for s[i:j], where j is not at the start of a code point #14217

Closed
wants to merge 2 commits into from

Conversation

JobJob
Copy link

@JobJob JobJob commented Dec 1, 2015

Previously if j was in the middle of a code point then it was moved to the end of the code point to be more forgiving. After discussion with @stevengj and @nalimilan in #14158 the recommendation was it would be more sensible to throw an error.
Also added some more tests.

@tkelman tkelman added the unicode Related to unicode characters and encodings label Dec 1, 2015
@@ -101,26 +101,33 @@ sizeof(s::UTF8String) = sizeof(s.data)

lastidx(s::UTF8String) = length(s.data)

isfirstbyte(c::UInt8) = (c & 0xc0) != 0x80 # == !is_valid_continuation(c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@code_llvm says that !is_valid_continuation(c) compiles to code identical to firstbyte(c), so I don't think we need a separate firstbyte function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if it would be optimised to the same code, thanks for the clarification. For me it also aids readability, but I'm happy to remove.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to remove.

@JobJob JobJob changed the title WIP: Throw UnicodeError for s[i:j], where j is not at the start or end of a code point WIP: Throw UnicodeError for s[i:j], where j is not at the start of a code point Dec 3, 2015
adds UTF16 getindex
Fixes chop and date format parsing
more tests
improves Unicode invalid index error message
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants