-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ncodeunits(c::Char): fast equivalent of ncodeunits(string(c)) #29153
Conversation
base/char.jl
Outdated
@@ -134,7 +141,7 @@ function decode_overlong(c::Char) | |||
end | |||
|
|||
""" | |||
decode_overlong(c::AbstractChar) | |||
decode_overlong(c::AbstractChar) -> UInt32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this return type for all AbstractChar
or only for Char
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result is a code point which there's no reason to represent as anything but a UInt32
.
base/char.jl
Outdated
ncodeunits(c::Char) = max(1, 4 - (trailing_zeros(reinterpret(UInt32, c)) >> 3)) | ||
|
||
""" | ||
codepoint(c::AbstractChar) -> UInt32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps UInt32
->Integer
if this is for AbstractChar
or add an extra line
codepoint(c::AbstractChar) -> Integer
codepoint(c::Char) -> UInt32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, there's not reason not to represent a code point as a UInt32
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I was thinking about the docs for this function that explicitly say
For `Char`, this is a `UInt32` value, but
`AbstractChar` types that represent only a subset of Unicode may
return a different-sized integer (e.g. `UInt8`).
The signature and the docs seem at odds now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess someone could do that although it seems kind of silly to me. I didn't write these docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who wrote the docs are not really that important though... Just that the end result is consistent and doesn't say in the signature that codepoint(c::AbstractChar)
has to return a UInt32
and in the documentation just below it, that it can return a UInt8
.
|
Interestingly, according to |
@@ -91,7 +98,7 @@ end | |||
# not to support malformed or overlong encodings. | |||
|
|||
""" | |||
ismalformed(c::AbstractChar) | |||
ismalformed(c::AbstractChar) -> Bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we still using this syntax or are we transitioning to ismalformed(c::AbstractChar)::Bool
in documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I'm aware we're still using this syntax most places.
This is pretty efficient: julia> @code_native write(devnull, 'x')
bswapl %edi
xorl %eax, %eax
nopw %cs:(%rax,%rax)
L16:
shrl $8, %edi
addq $1, %rax
testl %edi, %edi
jne L16
retq Versus: julia> @code_native ncodeunits('x')
tzcntl %edi, %eax
shrl $3, %eax
movl $4, %ecx
subq %rax, %rcx
testq %rcx, %rcx
movl $1, %eax
cmovgq %rcx, %rax
retq The |
Intel predicts the second is faster, by 10%:
But that's actually only for my native CPU, if we look back in time (ivybridge, broadwell, haswell, nehalem), we see that the predicted performance of the first has been relatively unchanged over time, while the performance of the second has been steadily improving. What's I think is likely happening is that the first loop is actually much cheaper for the processor to execute (much lower latency), so it has always done fairly well in a benchmarking loop. Whereas the second loop actually requires more transistors to reach the same level of performance (the above output is truncated, the full output includes some graphs to illustrate this point). I could be wrong, since I'm just reverse-engineering the output of a static-prediction tool, but that would be my analysis. |
cb4bc8e
to
b931684
Compare
There was a non-public `codelen(c::Char)` method which previously did this. This also replaces internal uses of this with `ncodeunits(c)`.
b931684
to
d4d577e
Compare
I replaced the internal-only |
Also, went with the |
changes between Julia 1.0 and 1.1, including: - Custom .css-style for compat admonitions. - Information about compat annotations to CONTRIBUTING.md. - NEWS.md entry for PRs #30090, #30035, #30022, #29978, #29969, #29858, #29845, #29754, #29638, #29636, #29615, #29600, #29506, #29469, #29316, #29259, #29178, #29153, #29033, #28902, #28761, #28745, #28708, #28696, #29997, #28790, #29092, #29108, #29782 - Compat annotation for PRs #30090, #30013, #29978, #29890, #29858, #29827, #29754, #29679, #29636, #29623, #29600, #29440, #29316, #29259, #29178, #29157, #29153, #29033, #28902, #28878, #28761, #28708, #28156, #29733, #29670, #29997, #28790, #29092, #29108, #29782, #25278 - Documentation for broadcasting CartesianIndices (#30230). - Documentation for Base.julia_cmd(). - Documentation for colon constructor of CartesianIndices (#29440). - Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix). - Run NEWS-update.jl. Co-authored-by: Morten Piibeleht <[email protected]> Co-authored-by: Fredrik Ekre <[email protected]>
changes between Julia 1.0 and 1.1, including: - Custom .css-style for compat admonitions. - Information about compat annotations to CONTRIBUTING.md. - NEWS.md entry for PRs #30090, #30035, #30022, #29978, #29969, #29858, #29845, #29754, #29638, #29636, #29615, #29600, #29506, #29469, #29316, #29259, #29178, #29153, #29033, #28902, #28761, #28745, #28708, #28696, #29997, #28790, #29092, #29108, #29782 - Compat annotation for PRs #30090, #30013, #29978, #29890, #29858, #29827, #29754, #29679, #29636, #29623, #29600, #29440, #29316, #29259, #29178, #29157, #29153, #29033, #28902, #28878, #28761, #28708, #28156, #29733, #29670, #29997, #28790, #29092, #29108, #29782, #25278 - Documentation for broadcasting CartesianIndices (#30230). - Documentation for Base.julia_cmd(). - Documentation for colon constructor of CartesianIndices (#29440). - Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix). - Run NEWS-update.jl. Co-authored-by: Morten Piibeleht <[email protected]> Co-authored-by: Fredrik Ekre <[email protected]>
Addition of NEWS and compat admonitions for important changes between Julia 1.0 and 1.1, including: - Custom .css-style for compat admonitions. - Information about compat annotations to CONTRIBUTING.md. - NEWS.md entry for PRs #30090, #30035, #30022, #29978, #29969, #29858, #29845, #29754, #29638, #29636, #29615, #29600, #29506, #29469, #29316, #29259, #29178, #29153, #29033, #28902, #28761, #28745, #28708, #28696, #29997, #28790, #29092, #29108, #29782 - Compat annotation for PRs #30090, #30013, #29978, #29890, #29858, #29827, #29754, #29679, #29636, #29623, #29600, #29440, #29316, #29259, #29178, #29157, #29153, #29033, #28902, #28878, #28761, #28708, #28156, #29733, #29670, #29997, #28790, #29092, #29108, #29782, #25278 - Documentation for broadcasting CartesianIndices (#30230). - Documentation for Base.julia_cmd(). - Documentation for colon constructor of CartesianIndices (#29440). - Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix). - Run NEWS-update.jl. Co-authored-by: Morten Piibeleht <[email protected]> Co-authored-by: Fredrik Ekre <[email protected]>
No description provided.