ncodeunits(c::Char): fast equivalent of ncodeunits(string(c)) #29153

StefanKarpinski · 2018-09-12T18:33:47Z

No description provided.

KristofferC · 2018-09-12T18:37:04Z

base/char.jl

@@ -134,7 +141,7 @@ function decode_overlong(c::Char)
 end

 """
-    decode_overlong(c::AbstractChar)
+    decode_overlong(c::AbstractChar) -> UInt32


Is this return type for all AbstractChar or only for Char?

The result is a code point which there's no reason to represent as anything but a UInt32.

KristofferC · 2018-09-12T18:37:51Z

base/char.jl

+ncodeunits(c::Char) = max(1, 4 - (trailing_zeros(reinterpret(UInt32, c)) >> 3))
+
+"""
+    codepoint(c::AbstractChar) -> UInt32


Perhaps UInt32->Integer if this is for AbstractChar or add an extra line

codepoint(c::AbstractChar) -> Integer codepoint(c::Char) -> UInt32

Likewise, there's not reason not to represent a code point as a UInt32.

Ok, I was thinking about the docs for this function that explicitly say

For `Char`, this is a `UInt32` value, but `AbstractChar` types that represent only a subset of Unicode may return a different-sized integer (e.g. `UInt8`).

The signature and the docs seem at odds now.

I guess someone could do that although it seems kind of silly to me. I didn't write these docs.

Who wrote the docs are not really that important though... Just that the end result is consistent and doesn't say in the signature that codepoint(c::AbstractChar) has to return a UInt32 and in the documentation just below it, that it can return a UInt8.

stevengj · 2018-09-13T15:10:20Z

~~You could define a fallback method ncodeunits(c::AbstractChar) = write(devnull, c)?~~ Maybe not, since write will return a count of bytes, which might not be code-units if c represents some other encoding?

stevengj · 2018-09-13T15:14:59Z

Interestingly, according to @btime, the ncodeunits implementation here takes exactly the same amount of time as write(devnull, c).

stevengj · 2018-09-13T15:23:07Z

base/char.jl

@@ -91,7 +98,7 @@ end
 #           not to support malformed or overlong encodings.

 """
-    ismalformed(c::AbstractChar)
+    ismalformed(c::AbstractChar) -> Bool


Are we still using this syntax or are we transitioning to ismalformed(c::AbstractChar)::Bool in documentation?

As far as I'm aware we're still using this syntax most places.

StefanKarpinski · 2018-09-13T15:25:41Z

This is pretty efficient:

julia> @code_native write(devnull, 'x')
    bswapl	%edi
    xorl	%eax, %eax
    nopw	%cs:(%rax,%rax)
L16:
    shrl	$8, %edi
    addq	$1, %rax
    testl	%edi, %edi
    jne	L16
    retq

Versus:

julia> @code_native ncodeunits('x')
	tzcntl	%edi, %eax
	shrl	$3, %eax
	movl	$4, %ecx
	subq	%rax, %rcx
	testq	%rcx, %rcx
	movl	$1, %eax
	cmovgq	%rcx, %rax
	retq

The write version might be slower if the loop branch is unpredictable, i.e. characters with different sizes. The ncodeunits version could be faster if I could get rid of the max in it. There may be some clever way to do that but I'm not sure that this really warrants that much micro-optimization.

vtjnash · 2018-09-13T16:21:01Z

Intel predicts the second is faster, by 10%:

$ ./usr/tools/llvm-mca
    bswapl	%edi
    xorl	%eax, %eax
    nopw	%cs:(%rax,%rax)
L16:
    shrl	$8, %edi
    addq	$1, %rax
    testl	%edi, %edi
    jne	L16
^D
Iterations:        100
Instructions:      700
Total Cycles:      205
Total uOps:        700

Dispatch Width:    6
uOps Per Cycle:    3.41
IPC:               3.41
Block RThroughput: 1.2


Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)

[1]    [2]    [3]    [4]    [5]    [6]    Instructions:
 1      1     0.50                        bswapl	%edi
 1      1     0.25                        xorl	%eax, %eax
 1      1     0.17                        nopw	%cs:(%rax,%rax)
 1      1     0.50                        shrl	$8, %edi
 1      1     0.25                        addq	$1, %rax
 1      1     0.25                        testl	%edi, %edi
 1      1     0.50                        jne	L16


Resources:
[0]   - SKXDivider
[1]   - SKXFPDivider
[2]   - SKXPort0
[3]   - SKXPort1
[4]   - SKXPort2
[5]   - SKXPort3
[6]   - SKXPort4
[7]   - SKXPort5
[8]   - SKXPort6
[9]   - SKXPort7


Resource pressure per iteration:
[0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]    
 -      -     1.50   1.50    -      -      -     1.50   1.50    -     

Resource pressure by instruction:
[0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]    Instructions:
 -      -      -     0.50    -      -      -     0.50    -      -     bswapl	%edi
 -      -     0.49    -      -      -      -     0.50   0.01    -     xorl	%eax, %eax
 -      -      -      -      -      -      -      -      -      -     nopw	%cs:(%rax,%rax)
 -      -     0.50    -      -      -      -      -     0.50    -     shrl	$8, %edi
 -      -      -     0.50    -      -      -     0.50    -      -     addq	$1, %rax
 -      -      -     0.50    -      -      -      -     0.50    -     testl	%edi, %edi
 -      -     0.51    -      -      -      -      -     0.49    -     jne	L16


$ ./usr/tools/llvm-mca
	tzcntl	%edi, %eax
	shrl	$3, %eax
	movl	$4, %ecx
	subq	%rax, %rcx
	testq	%rcx, %rcx
	movl	$1, %eax
	cmovgq	%rcx, %rax
^D
Iterations:        100
Instructions:      700
Total Cycles:      184
Total uOps:        700

Dispatch Width:    6
uOps Per Cycle:    3.80
IPC:               3.80
Block RThroughput: 1.2


Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)

[1]    [2]    [3]    [4]    [5]    [6]    Instructions:
 1      3     1.00                        tzcntl	%edi, %eax
 1      1     0.50                        shrl	$3, %eax
 1      1     0.25                        movl	$4, %ecx
 1      1     0.25                        subq	%rax, %rcx
 1      1     0.25                        testq	%rcx, %rcx
 1      1     0.25                        movl	$1, %eax
 1      1     0.50                        cmovgq	%rcx, %rax


Resources:
[0]   - SKXDivider
[1]   - SKXFPDivider
[2]   - SKXPort0
[3]   - SKXPort1
[4]   - SKXPort2
[5]   - SKXPort3
[6]   - SKXPort4
[7]   - SKXPort5
[8]   - SKXPort6
[9]   - SKXPort7


Resource pressure per iteration:
[0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]    
 -      -     1.76   1.74    -      -      -     1.73   1.77    -     

Resource pressure by instruction:
[0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]    Instructions:
 -      -      -     1.00    -      -      -      -      -      -     tzcntl	%edi, %eax
 -      -     0.17    -      -      -      -      -     0.83    -     shrl	$3, %eax
 -      -     0.02    -      -      -      -     0.96   0.02    -     movl	$4, %ecx
 -      -     0.41   0.17    -      -      -     0.03   0.39    -     subq	%rax, %rcx
 -      -     0.31   0.31    -      -      -     0.07   0.31    -     testq	%rcx, %rcx
 -      -     0.01   0.26    -      -      -     0.67   0.06    -     movl	$1, %eax
 -      -     0.84    -      -      -      -      -     0.16    -     cmovgq	%rcx, %rax

But that's actually only for my native CPU, if we look back in time (ivybridge, broadwell, haswell, nehalem), we see that the predicted performance of the first has been relatively unchanged over time, while the performance of the second has been steadily improving.

What's I think is likely happening is that the first loop is actually much cheaper for the processor to execute (much lower latency), so it has always done fairly well in a benchmarking loop. Whereas the second loop actually requires more transistors to reach the same level of performance (the above output is truncated, the full output includes some graphs to illustrate this point). I could be wrong, since I'm just reverse-engineering the output of a static-prediction tool, but that would be my analysis.

There was a non-public `codelen(c::Char)` method which previously did this. This also replaces internal uses of this with `ncodeunits(c)`.

StefanKarpinski · 2018-09-14T15:11:15Z

I replaced the internal-only Base.codelen function with this new method of ncodeunits.

StefanKarpinski · 2018-09-14T15:12:37Z

Also, went with the write(devnull, c) definition since I figure that something that works well on a larger span of older and newer CPUs is somewhat better. If this changes in the future we could always use a different definition.

changes between Julia 1.0 and 1.1, including: - Custom .css-style for compat admonitions. - Information about compat annotations to CONTRIBUTING.md. - NEWS.md entry for PRs #30090, #30035, #30022, #29978, #29969, #29858, #29845, #29754, #29638, #29636, #29615, #29600, #29506, #29469, #29316, #29259, #29178, #29153, #29033, #28902, #28761, #28745, #28708, #28696, #29997, #28790, #29092, #29108, #29782 - Compat annotation for PRs #30090, #30013, #29978, #29890, #29858, #29827, #29754, #29679, #29636, #29623, #29600, #29440, #29316, #29259, #29178, #29157, #29153, #29033, #28902, #28878, #28761, #28708, #28156, #29733, #29670, #29997, #28790, #29092, #29108, #29782, #25278 - Documentation for broadcasting CartesianIndices (#30230). - Documentation for Base.julia_cmd(). - Documentation for colon constructor of CartesianIndices (#29440). - Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix). - Run NEWS-update.jl. Co-authored-by: Morten Piibeleht <[email protected]> Co-authored-by: Fredrik Ekre <[email protected]>

Addition of NEWS and compat admonitions for important changes between Julia 1.0 and 1.1, including: - Custom .css-style for compat admonitions. - Information about compat annotations to CONTRIBUTING.md. - NEWS.md entry for PRs #30090, #30035, #30022, #29978, #29969, #29858, #29845, #29754, #29638, #29636, #29615, #29600, #29506, #29469, #29316, #29259, #29178, #29153, #29033, #28902, #28761, #28745, #28708, #28696, #29997, #28790, #29092, #29108, #29782 - Compat annotation for PRs #30090, #30013, #29978, #29890, #29858, #29827, #29754, #29679, #29636, #29623, #29600, #29440, #29316, #29259, #29178, #29157, #29153, #29033, #28902, #28878, #28761, #28708, #28156, #29733, #29670, #29997, #28790, #29092, #29108, #29782, #25278 - Documentation for broadcasting CartesianIndices (#30230). - Documentation for Base.julia_cmd(). - Documentation for colon constructor of CartesianIndices (#29440). - Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix). - Run NEWS-update.jl. Co-authored-by: Morten Piibeleht <[email protected]> Co-authored-by: Fredrik Ekre <[email protected]>

KristofferC reviewed Sep 12, 2018

View reviewed changes

StefanKarpinski referenced this pull request Sep 12, 2018

move codelen and first_utf8_byte to Char.jl (#28894)

62de472

stevengj reviewed Sep 13, 2018

View reviewed changes

base/char.jl: tweak doc strings

59dba6f

StefanKarpinski force-pushed the sk/ncodeunits-char branch from cb4bc8e to b931684 Compare September 13, 2018 20:40

define ncodeunits(c::Char) as fast equivalent of ncodeunits(string(c))

d4d577e

There was a non-public `codelen(c::Char)` method which previously did this. This also replaces internal uses of this with `ncodeunits(c)`.

StefanKarpinski force-pushed the sk/ncodeunits-char branch from b931684 to d4d577e Compare September 13, 2018 20:54

StefanKarpinski merged commit 3b02991 into master Sep 14, 2018

StefanKarpinski deleted the sk/ncodeunits-char branch September 14, 2018 15:11

fredrikekre added a commit that referenced this pull request Nov 30, 2018

News and compat annotation for #29153 (ncodeunits(::Char)).

38da879

fredrikekre added a commit that referenced this pull request Dec 1, 2018

News and compat annotation for #29153 (ncodeunits(::Char)).

4e67301

fredrikekre added a commit that referenced this pull request Dec 1, 2018

News and compat annotation for #29153 (ncodeunits(::Char)).

8dec0d8

fredrikekre added a commit that referenced this pull request Dec 3, 2018

News and compat annotation for #29153 (ncodeunits(::Char)).

fef96e4

fredrikekre added a commit that referenced this pull request Dec 4, 2018

News and compat annotation for #29153 (ncodeunits(::Char)).

3b57437

fredrikekre added a commit that referenced this pull request Dec 4, 2018

News and compat annotation for #29153 (ncodeunits(::Char)).

9faa3af

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ncodeunits(c::Char): fast equivalent of ncodeunits(string(c)) #29153

ncodeunits(c::Char): fast equivalent of ncodeunits(string(c)) #29153

StefanKarpinski commented Sep 12, 2018

KristofferC Sep 12, 2018

StefanKarpinski Sep 12, 2018

KristofferC Sep 12, 2018 •

edited

Loading

StefanKarpinski Sep 12, 2018

KristofferC Sep 13, 2018 •

edited

Loading

StefanKarpinski Sep 13, 2018

KristofferC Sep 13, 2018

stevengj commented Sep 13, 2018 •

edited

Loading

stevengj commented Sep 13, 2018

stevengj Sep 13, 2018

StefanKarpinski Sep 13, 2018

StefanKarpinski commented Sep 13, 2018

vtjnash commented Sep 13, 2018

StefanKarpinski commented Sep 14, 2018

StefanKarpinski commented Sep 14, 2018

ncodeunits(c::Char): fast equivalent of ncodeunits(string(c)) #29153

ncodeunits(c::Char): fast equivalent of ncodeunits(string(c)) #29153

Conversation

StefanKarpinski commented Sep 12, 2018

KristofferC Sep 12, 2018

Choose a reason for hiding this comment

StefanKarpinski Sep 12, 2018

Choose a reason for hiding this comment

KristofferC Sep 12, 2018 • edited Loading

Choose a reason for hiding this comment

StefanKarpinski Sep 12, 2018

Choose a reason for hiding this comment

KristofferC Sep 13, 2018 • edited Loading

Choose a reason for hiding this comment

StefanKarpinski Sep 13, 2018

Choose a reason for hiding this comment

KristofferC Sep 13, 2018

Choose a reason for hiding this comment

stevengj commented Sep 13, 2018 • edited Loading

stevengj commented Sep 13, 2018

stevengj Sep 13, 2018

Choose a reason for hiding this comment

StefanKarpinski Sep 13, 2018

Choose a reason for hiding this comment

StefanKarpinski commented Sep 13, 2018

vtjnash commented Sep 13, 2018

StefanKarpinski commented Sep 14, 2018

StefanKarpinski commented Sep 14, 2018

KristofferC Sep 12, 2018 •

edited

Loading

KristofferC Sep 13, 2018 •

edited

Loading

stevengj commented Sep 13, 2018 •

edited

Loading