ones(2^33) is slow #1795

ViralBShah · 2012-12-20T13:27:14Z

I am not sure what the expected time for something like this should be, but it takes almost a minute to do ones(2^33).

julia> @time a = ones(2^33)
elapsed time: 58.07947397232056 seconds

The text was updated successfully, but these errors were encountered:

ViralBShah · 2012-12-20T13:33:38Z

This C program took almost the same amount of time. I wonder if one can do better.

$ time ./bigmalloc
1.000000

real    0m52.652s
user    0m9.349s
sys 0m40.359s

// gcc bigmalloc.c -o bigmalloc -std=c99 -O2
#include <stdlib.h>
#include <stdio.h>

int main()
{
  double *p = (double *) malloc ((size_t) 8589934592 * sizeof(double));
  for (long i=0; i<8589934592; ++i)
    p[i] = 1.0;

  printf("%f\n", p[8589934591]);

  return(0);
}

pao · 2012-12-20T14:00:27Z

That touches 64 GB of memory, doesn't it? How much RAM does the system have? Which malloc() is this (glibc, Mac, some other malloc?) Seems like a lot needs to go right for this allocation to go smoothly.

StefanKarpinski · 2012-12-20T14:00:54Z

For zeros we could do the trick of mmapping /dev/zero, which should be much faster. Doesn't address the ones case though. I don't think there's a great way to do that unless we want to get into the business of representing constant arrays specially.

ViralBShah · 2012-12-20T14:19:39Z

This is on julia.mit.edu with 1TB of RAM running a recent ubuntu (12.04).

pao · 2012-12-20T14:31:14Z

That would give you some room to work.

ViralBShah · 2012-12-20T14:40:08Z

Almost all the time goes in the for loop in the C code. The malloc takes less than a second.

StefanKarpinski · 2012-12-20T14:44:12Z

The filling in of data could be done in parallel using some of the 80 cores on that machine. That would make it go faster. I would generally like to have easy ways to express simple data-parallel code better.

timholy · 2012-12-20T14:54:32Z

How about using memfill? For a manual Julia version you could have a for loop that "seeds" with a few 1.0's, and then calls memcpy on two-fold-increasing chunks:

a = Array(T, 2^33)
n = 16
a[1:n] = one(T)
copy_to(a, n, a, 1, n); n *= 2
copy_to(a, n, a, 1, n); n *= 2
copy_to(a, n, a, 1, n); n *= 2
...

ViralBShah · 2012-12-20T14:54:49Z

Same thing is touched upon in #1790. Although I would prefer not having multiple models of parallelism, it certainly would be nice to be able to have a data parallel model to use multiple cores without launching 80 julias.

ViralBShah · 2012-12-20T14:55:41Z

@timholy Does memfill work faster? If so, we could even use it inside fill.

timholy · 2012-12-20T14:56:45Z

I don't know. It's not part of the C standard library, it's an addon. I've never tested it. If it just loops (and doesn't use that two fold growing trick), the version I wrote out might be faster.

StefanKarpinski · 2012-12-20T14:58:24Z

I would be rather surprised if the copying trick is faster. Once n gets bigger than your cache that's going to cause totally spurious memory fetches that you don't need.

ViralBShah · 2012-12-20T15:12:47Z

I tried memset in my C code to set a constant integer instead of using the for loop, and that finishes in 18 seconds.

timholy · 2012-12-20T15:16:51Z

Yes, I just looked and noticed that fill!, which is being used by ones, is using memset. So you're already doing whatever is most efficient.

I tested the copy trick. It's not bad, but it's not quite as fast as memset.

ViralBShah · 2012-12-20T18:49:55Z

I guess there really isn't much that can be done about this for sequential julia.

JeffBezanson · 2012-12-20T22:57:34Z

This is more than 1GB/second, which doesn't strike me as too bad.
It might be worth considering arrays stored compressed in memory. See https://github.com/FrancescAlted/blosc

ViralBShah · 2012-12-20T22:59:53Z

Of course it's quite good. rand(2^33) also takes the same time. It just feels slow. Parallelism is the answer here. Closing this issue, as it is really not a julia performance problem.

ViralBShah closed this as completed Dec 20, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ones(2^33) is slow #1795

ones(2^33) is slow #1795

ViralBShah commented Dec 20, 2012

ViralBShah commented Dec 20, 2012

pao commented Dec 20, 2012

StefanKarpinski commented Dec 20, 2012

ViralBShah commented Dec 20, 2012

pao commented Dec 20, 2012

ViralBShah commented Dec 20, 2012

StefanKarpinski commented Dec 20, 2012

timholy commented Dec 20, 2012

ViralBShah commented Dec 20, 2012

ViralBShah commented Dec 20, 2012

timholy commented Dec 20, 2012

StefanKarpinski commented Dec 20, 2012

ViralBShah commented Dec 20, 2012

timholy commented Dec 20, 2012

ViralBShah commented Dec 20, 2012

JeffBezanson commented Dec 20, 2012

ViralBShah commented Dec 20, 2012

ones(2^33) is slow #1795

ones(2^33) is slow #1795

Comments

ViralBShah commented Dec 20, 2012

ViralBShah commented Dec 20, 2012

pao commented Dec 20, 2012

StefanKarpinski commented Dec 20, 2012

ViralBShah commented Dec 20, 2012

pao commented Dec 20, 2012

ViralBShah commented Dec 20, 2012

StefanKarpinski commented Dec 20, 2012

timholy commented Dec 20, 2012

ViralBShah commented Dec 20, 2012

ViralBShah commented Dec 20, 2012

timholy commented Dec 20, 2012

StefanKarpinski commented Dec 20, 2012

ViralBShah commented Dec 20, 2012

timholy commented Dec 20, 2012

ViralBShah commented Dec 20, 2012

JeffBezanson commented Dec 20, 2012

ViralBShah commented Dec 20, 2012