Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving performance of done(s::AbstractString, i) #18081

Closed
omus opened this issue Aug 17, 2016 · 5 comments
Closed

Improving performance of done(s::AbstractString, i) #18081

omus opened this issue Aug 17, 2016 · 5 comments
Labels
performance Must go faster strings "Strings!"

Comments

@omus
Copy link
Member

omus commented Aug 17, 2016

I've noticed that you can improve string iteration performance by avoiding using done as the condition in a loop. Take lstrip as an example:

Current implementation:

function lstrip(s::AbstractString, chars::Chars=_default_delims)
    i = start(s)
    while !done(s,i)
        c, j = next(s,i)
        if !(c in chars)
            return s[i:end]
        end
        i = j
    end
    s[end+1:end]
end

Revised version:

function lstrip_faster(s::AbstractString, chars::Base.Chars=Base._default_delims)
    i = start(s)
    end_pos = endof(s)
    while i <= end_pos
        c, j = next(s,i)
        if !(c in chars)
            return s[i:end]
        end
        i = j
    end
    s[end+1:end]
end

The results:

julia> using BenchmarkTools

julia> str = " "^1000 * "a";

julia> original = @benchmark lstrip(str)
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     3
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  112.00 bytes
  allocs estimate:  2
  minimum time:     8.13 μs (0.00% GC)
  median time:      8.18 μs (0.00% GC)
  mean time:        8.50 μs (0.00% GC)
  maximum time:     43.92 μs (0.00% GC)

julia> faster = @benchmark lstrip_faster(str)
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     8
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  112.00 bytes
  allocs estimate:  2
  minimum time:     3.45 μs (0.00% GC)
  median time:      3.48 μs (0.00% GC)
  mean time:        3.76 μs (0.00% GC)
  maximum time:     29.03 μs (0.00% GC)

julia> judge(median(faster), median(original))
BenchmarkTools.TrialJudgement: 
  time:   -57.54% => improvement (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)

Maybe it would be worthwhile caching the endof result in the String object?

@stevengj
Copy link
Member

Maybe cache it in the iterator state?

@stevengj stevengj added strings "Strings!" performance Must go faster labels Aug 17, 2016
@TotalVerb
Copy link
Contributor

TotalVerb commented Aug 20, 2016

Why does done even call endof(s)? Would endof(s.data) not suffice? Presuming next does the right thing and returns beyond the end of the string's backing data, that is.

@stevengj
Copy link
Member

Re-opening as the change was reverted in 6d179b3

@stevengj stevengj reopened this Aug 29, 2016
@KristofferC
Copy link
Member

I tried to fix this in an alternative way by just using a new iterator but a lot of methods rely on start(s::String) -> Int. Isn't that sort of using an implementation detail or should you always be able to do x[start(x)]?

@nalimilan
Copy link
Member

I think you're right that people shouldn't rely on this. x[start(x)] should always be replaced with x[1] (at least as long as #9297 isn't implemented).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster strings "Strings!"
Projects
None yet
Development

No branches or pull requests

5 participants