Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantile is inefficient in common case of CI calculation #84

Open
bkamins opened this issue Aug 30, 2021 · 4 comments · May be fixed by #86
Open

quantile is inefficient in common case of CI calculation #84

bkamins opened this issue Aug 30, 2021 · 4 comments · May be fixed by #86

Comments

@bkamins
Copy link
Contributor

bkamins commented Aug 30, 2021

Example:

julia> x=rand(10^4);

julia> @btime [quantile($x, 0.05), quantile($x, 0.95)]
  160.000 μs (5 allocations: 156.50 KiB)
2-element Vector{Float64}:
 0.04742732947951543
 0.9507927513232511

julia> @btime quantile($x, [0.05, 0.95])
  597.700 μs (4 allocations: 78.39 KiB)
2-element Vector{Float64}:
 0.04742732947951543
 0.9507927513232511

julia> x = rand(10^6);

julia> @btime quantile($x, [0.05, 0.95])
  92.449 ms (4 allocations: 7.63 MiB)
2-element Vector{Float64}:
 0.049906389147197056
 0.9498833775516955

julia> @btime [quantile($x, 0.05), quantile($x, 0.95)]
  18.751 ms (5 allocations: 15.26 MiB)
2-element Vector{Float64}:
 0.049906389147197056
 0.9498833775516955

julia> @btime quantile($x, [0.005, 0.995])
  98.948 ms (4 allocations: 7.63 MiB)
2-element Vector{Float64}:
 0.005030018622636852
 0.9949264399317231

julia> @btime [quantile($x, 0.005), quantile($x, 0.995)]
  17.697 ms (5 allocations: 15.26 MiB)
2-element Vector{Float64}:
 0.005030018622636852
 0.9949264399317231

but

julia> @btime quantile($x, [0.49, 0.51])
  12.591 ms (4 allocations: 7.63 MiB)
2-element Vector{Float64}:
 0.48882655343720904
 0.5090073220557704

julia> @btime [quantile($x, 0.49), quantile($x, 0.51)]
  22.728 ms (5 allocations: 15.26 MiB)
2-element Vector{Float64}:
 0.48882655343720904
 0.5090073220557704

The reason is that doing partial sort twice on tails if we are near the end is faster than doing one partial sort. Unfortunately this is a common case I would assume. Maybe we should introduce some optimization here?

@nalimilan
Copy link
Member

The reason is that doing partial sort twice on tails if we are near the end is faster than doing one partial sort

Why is that the case? I understand why there's no performance gain, but I wouldn't have expected it to be slower.

@bkamins
Copy link
Contributor Author

bkamins commented Sep 7, 2021

The reason is here https://github.com/JuliaLang/Statistics.jl/blob/54f9b0d999813aa9fab039f632df222ffd2a96a8/src/Statistics.jl#L972

for one value lo is close to hi and you need to sort little if you are near the boundary. But for two values where one is very low and the other is very high you have to sort almost everything.

@nalimilan
Copy link
Member

Wow, OK. We should probably just call sort! once for each quantile instead (which I thought we were doing). We should still benefit from the fact that we sorted some entries, right? Anyway the most common case is probably not to compute nearby quantiles (rather the contrary).

@bkamins
Copy link
Contributor Author

bkamins commented Sep 8, 2021

Yes, but only of 2 quantiles. Probably for more than 2 quantiles it is better to do what we do now I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants