-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speeding up copy() #121
Comments
Filed DCOPY performance issue for openblas: OpenMathLib/OpenBLAS#45 |
I notice similar issues on Opteron+Linux also. Small size:
Large size:
|
Good thread on memcpy: http://stackoverflow.com/questions/1715224/very-fast-memcpy-for-image-processing |
memcpy in FreeBSD. The first links is generic, whereas the second is for amd64. http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/string/bcopy.c?rev=1.7.14.1;content-type=text%2Fplain It is quite likely that Apple uses an optimized memcpy supplied by Intel. Here's the Apple libc source: http://www.opensource.apple.com/source/Libc/Libc-594.9.5/string/ |
Some notes on memcpy at Intel - a bit dated: http://software.intel.com/en-us/articles/memcpy-performance/ |
I'm seeing less difference - no factors of 10 between memcpy and blas. But I also see blas faster up to 200 or 300 elements. Can we just pick a cutoff of 200 and close it? |
Working on closing this. -viral On 05-Aug-2011, at 12:54 AM, JeffBezanson wrote:
|
Now, this is much faster. julia> a = ones(100) julia> @time for i=1:1e5; copy(a); end; |
Ok, I have now completed the implementation, in what I believe is a systematic way - and all speed gains of the blas copy have disappeared. I suspect that I have introduced some julia overhead that can be avoided. For testing, I am checking in mcopy_to() and bcopy_to(), which can be removed later. julia> n = 100; a = ones(n); b = Array(Float64,n); julia> @time for i=1:1000000; mcopy_to(pointer(b), pointer(a), n); end julia> @time for i=1:1000000; bcopy_to(pointer(b), pointer(a), n); end -viral On 05-Aug-2011, at 12:54 AM, JeffBezanson wrote:
|
The other explanation could be that OS X 10.7 (Lion) has a much improved memcpy on mac for small memcpys. -viral On 05-Aug-2011, at 6:05 PM, Viral Shah wrote:
|
Use `git grep` on each package verison compatible with julia 0.7 to figure out which stdlib packages they use. This may be slightly wrong in cases of dynamic code loading, but it's a good first-order approximation of stdlib dependencies. Also generate a registry for standard library packages so that version resolution can figure out how to interact with them.
Upgrade exercises to work in v0.7/v1.0
Weighted median: Handle equal weights
Create a deep_copy() so that it can be explicitly used where necessary.
Also, copy() and copy_to() need to be optimized to use the fastest implementations. Here are some tests (Mac, Intel Core 2 Duo) that suggest:
Case 2 may be omitted since it is close enough to case 3 to keep things simple. Also, these tests need to be carried out on different architectures too.
The text was updated successfully, but these errors were encountered: