Speeding up copy() #121

ViralBShah · 2011-07-16T04:56:51Z

Create a deep_copy() so that it can be explicitly used where necessary.

Also, copy() and copy_to() need to be optimized to use the fastest implementations. Here are some tests (Mac, Intel Core 2 Duo) that suggest:

memcpy for large copy and copy_to
Native julia for small copy
BLAS for small copy_to

Case 2 may be omitted since it is close enough to case 3 to keep things simple. Also, these tests need to be carried out on different architectures too.

!##### Test 1 #######

julia> a = ones(100)
[1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0]

!## Julia implementation (FASTEST)
julia> @time for i=1:1e5; jcopy(a); end;
elapsed time: 0.38377809524536133 sec

!## This is DCOPY from BLAS. It is an assembly language implementation in openblas
julia> @time for i=1:1e5; bcopy(a); end;
elapsed time: 0.41508984565734863 sec

!## This one dispatches to memcpy
julia> @time for i=1:1e5; copy(a); end;
elapsed time: 0.61258411407470703 sec

!##### Test 2 #######

I now implemented copy_to() for all cases to remove allocation/GC costs:

julia> a = ones(100)
[1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0]

julia> b = ones(100)
[1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0]

!## Julia implementation
julia> @time for i=1:1e5; jcopy_to(b,a); end;
elapsed time: 0.04522800445556641 sec

!## BLAS (FASTEST)
julia> @time for i=1:1e5; bcopy_to(b,a); end;
elapsed time: 0.01788496971130371 sec

!## memcpy
julia> @time for i=1:1e5; copy_to(b,a); end;
elapsed time: 0.27470088005065918 sec

!##### Test 3 #######

And now, a larger size:

julia> a = ones(1000000)
[1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0]

julia> b = ones(1000000)
[1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0]

!## Julia implementation
julia> @time for i=1:100; jcopy_to(b,a); end;
elapsed time: 0.5620429515838623 sec

!## BLAS
julia> @time for i=1:100; bcopy_to(b,a); end;
elapsed time: 0.5299229621887207 sec

!## memcpy (FASTEST)
julia> @time for i=1:100; copy_to(b,a); end;
elapsed time: 0.35404396057128906 sec

The text was updated successfully, but these errors were encountered:

ViralBShah · 2011-07-16T05:03:19Z

Filed DCOPY performance issue for openblas: OpenMathLib/OpenBLAS#45

ViralBShah · 2011-07-16T08:52:40Z

I notice similar issues on Opteron+Linux also.

Small size:

julia> @time for i=1:1e5; copy_to(b, a); end;
elapsed time: 0.25034022331237793 sec

julia> @time for i=1:1e5; bcopy_to(b, a); end;
elapsed time: 0.01954412460327148 sec

julia> @time for i=1:1e5; jcopy_to(b, a); end;
elapsed time: 0.04978704452514648 sec

Large size:

julia> @time for i=1:1e3; copy_to(b, a); end;
elapsed time: 3.59907007217407227 sec

julia> @time for i=1:1e3; jcopy_to(b, a); end;
elapsed time: 8.26361894607543945 sec

julia> @time for i=1:1e3; bcopy_to(b, a); end;
elapsed time: 5.8034358024597168 sec

ViralBShah · 2011-07-16T12:05:31Z

Good thread on memcpy:

http://stackoverflow.com/questions/1715224/very-fast-memcpy-for-image-processing

ViralBShah · 2011-07-16T12:13:05Z

memcpy in FreeBSD. The first links is generic, whereas the second is for amd64.

http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/string/bcopy.c?rev=1.7.14.1;content-type=text%2Fplain
http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/amd64/string/bcopy.S?rev=1.4;content-type=text%2Fplain

It is quite likely that Apple uses an optimized memcpy supplied by Intel. Here's the Apple libc source:

http://www.opensource.apple.com/source/Libc/Libc-594.9.5/string/

ViralBShah · 2011-07-18T08:54:27Z

Some notes on memcpy at Intel - a bit dated:

http://software.intel.com/en-us/articles/memcpy-performance/

JeffBezanson · 2011-08-04T19:24:32Z

I'm seeing less difference - no factors of 10 between memcpy and blas. But I also see blas faster up to 200 or 300 elements. Can we just pick a cutoff of 200 and close it?

ViralBShah · 2011-08-05T04:05:26Z

Working on closing this.

-viral

On 05-Aug-2011, at 12:54 AM, JeffBezanson wrote:

I'm seeing less difference - no factors of 10 between memcpy and blas. But I also see blas faster up to 200 or 300 elements. Can we just pick a cutoff of 200 and close it?

Reply to this email directly or view it on GitHub:
#121 (comment)

ViralBShah · 2011-08-05T08:38:40Z

Now, this is much faster.

julia> a = ones(100)
[1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0]

julia> @time for i=1:1e5; copy(a); end;
elapsed time: 0.28148198127746582 sec

ViralBShah · 2011-08-05T12:37:21Z

Ok, I have now completed the implementation, in what I believe is a systematic way - and all speed gains of the blas copy have disappeared. I suspect that I have introduced some julia overhead that can be avoided. For testing, I am checking in mcopy_to() and bcopy_to(), which can be removed later.

julia> n = 100; a = ones(n); b = Array(Float64,n);

julia> @time for i=1:1000000; mcopy_to(pointer(b), pointer(a), n); end
elapsed time: 0.32441210746765137 sec

julia> @time for i=1:1000000; bcopy_to(pointer(b), pointer(a), n); end
elapsed time: 0.43614697456359863 sec

-viral

On 05-Aug-2011, at 12:54 AM, JeffBezanson wrote:

I'm seeing less difference - no factors of 10 between memcpy and blas. But I also see blas faster up to 200 or 300 elements. Can we just pick a cutoff of 200 and close it?

Reply to this email directly or view it on GitHub:
#121 (comment)

ViralBShah · 2011-08-05T12:41:58Z

The other explanation could be that OS X 10.7 (Lion) has a much improved memcpy on mac for small memcpys.

-viral

On 05-Aug-2011, at 6:05 PM, Viral Shah wrote:

Ok, I have now completed the implementation, in what I believe is a systematic way - and all speed gains of the blas copy have disappeared. I suspect that I have introduced some julia overhead that can be avoided. For testing, I am checking in mcopy_to() and bcopy_to(), which can be removed later.

julia> n = 100; a = ones(n); b = Array(Float64,n);

julia> @time for i=1:1000000; mcopy_to(pointer(b), pointer(a), n); end
elapsed time: 0.32441210746765137 sec

julia> @time for i=1:1000000; bcopy_to(pointer(b), pointer(a), n); end
elapsed time: 0.43614697456359863 sec

-viral

On 05-Aug-2011, at 12:54 AM, JeffBezanson wrote:

I'm seeing less difference - no factors of 10 between memcpy and blas. But I also see blas faster up to 200 or 300 elements. Can we just pick a cutoff of 200 and close it?

Reply to this email directly or view it on GitHub:
#121 (comment)

add AbstractFloat (fix #121)

Use `git grep` on each package verison compatible with julia 0.7 to figure out which stdlib packages they use. This may be slightly wrong in cases of dynamic code loading, but it's a good first-order approximation of stdlib dependencies. Also generate a registry for standard library packages so that version resolution can figure out how to interact with them.

Upgrade exercises to work in v0.7/v1.0

Weighted median: Handle equal weights

This reverts commit 005e280.

ghost assigned ViralBShah Jul 16, 2011

ViralBShah closed this as completed in d9aa71f Aug 5, 2011

StefanKarpinski pushed a commit that referenced this issue Feb 8, 2018

compat for #12472

cbdebe3

add AbstractFloat (fix #121)

cmcaine pushed a commit to cmcaine/julia that referenced this issue Sep 24, 2020

Merge pull request JuliaLang#121 from exercism/0.7-compat

418577b

Upgrade exercises to work in v0.7/v1.0

LilithHafner pushed a commit to LilithHafner/julia that referenced this issue Oct 11, 2021

Merge pull request JuliaLang#121 from julian-gehring/patch-wmedian

d241266

Weighted median: Handle equal weights

NHDaly pushed a commit that referenced this issue May 22, 2024

Revert "enable small page (#116)" (#121)

627e6bb

This reverts commit 005e280.

stevengj mentioned this issue Aug 11, 2015

compat for JuliaLang/julia#12472 JuliaLang/Compat.jl#130

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up copy() #121

Speeding up copy() #121

ViralBShah commented Jul 16, 2011

ViralBShah commented Jul 16, 2011

ViralBShah commented Jul 16, 2011

ViralBShah commented Jul 16, 2011

ViralBShah commented Jul 16, 2011

ViralBShah commented Jul 18, 2011

JeffBezanson commented Aug 4, 2011

ViralBShah commented Aug 5, 2011

ViralBShah commented Aug 5, 2011

ViralBShah commented Aug 5, 2011

ViralBShah commented Aug 5, 2011

Speeding up copy() #121

Speeding up copy() #121

Comments

ViralBShah commented Jul 16, 2011

ViralBShah commented Jul 16, 2011

ViralBShah commented Jul 16, 2011

ViralBShah commented Jul 16, 2011

ViralBShah commented Jul 16, 2011

ViralBShah commented Jul 18, 2011

JeffBezanson commented Aug 4, 2011

ViralBShah commented Aug 5, 2011

ViralBShah commented Aug 5, 2011

ViralBShah commented Aug 5, 2011

ViralBShah commented Aug 5, 2011