mxnet.nd.random.multinomial is very slow on CPU and stucked on GPU #15231

chinakook · 2019-06-13T03:19:26Z

MXNet version is mxnet-cu100mkl 1.5.0b20190605(also tested on mxnet-cu100 1.4.1)

import time
import numpy as np
import mxnet as mx


p = [1 / 300000.] * 300000

probs = np.array(p)
t0 = time.time()
sample_times = np.random.multinomial(len(probs), np.array(probs))
t1 = time.time()
print("Numpy MultiNomial Time", t1 - t0)

probs = mx.nd.array(p, ctx=mx.cpu())
t0 = time.time()
indices = mx.nd.random.multinomial(probs, len(probs))
mx.nd.waitall()
t1 = time.time()
print("MXNet CPU MultiNomial Time", t1 - t0)

# stucked
# probs = mx.nd.array(p, ctx=mx.gpu(0))
# t0 = time.time()
# indices = mx.nd.random.multinomial(probs, len(probs))
# mx.nd.waitall()
# t1 = time.time()
# print("MXNet GPU MultiNomial Time", t1 - t0)

The result is

Numpy MultiNomial Time 0.01881909
MXNet CPU MultiNomial Time 43.93881797

The text was updated successfully, but these errors were encountered:

mxnet-label-bot · 2019-06-13T03:19:29Z

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Performance

pengzhao-intel · 2019-06-13T15:55:24Z

@zixuanweeei could you take a look for the CPU performance?

zixuanweeei · 2019-06-16T14:56:09Z

While np.random.multinomial gives the frequency of each outcome drawn from n experiments (sampling without replacement), mx.nd.random.multinomial gives the outcomes sampled from the multinomial distribution (sampling with replacement). At first thought, they produce the result in different ways. mx.nd.random.multinomial uses a very simple method as described in this link. The complexity of the method above can be O(N^2) for sampling K times from K possible outcomes with replacement. So it will be time-consuming for a large value of K as 300000 in your script. Any method can be taken to optimize the sampling method after that we further analysis the complexity and take some surveys on advance sampling algorithms.
Any insights of this?

chinakook · 2019-06-17T00:31:31Z

You can get nearly the same result as mx.nd.random.multinomial, but very swift.

        sample_list = []
        p = [1 / 300000.] * 300000

        probs = np.array(p)
        sample_times = np.random.multinomial(len(probs), np.array(probs))
        for i, t in enumerate(sample_times):
            if t > 0:
                sample_list.extend([i]*t)
        random.shuffle(sample_list)

chinakook · 2019-06-17T00:39:18Z

pyt*ch version

t0 = time.time()
p = [1/300000.] * 300000
probs = torch.as_tensor(p)
indices = torch.multinomial(probs, len(probs), True)
print("Pytorch output:\t\t", indices.tolist())
t1 = time.time()
print(t1 - t0)

zixuanweeei · 2019-06-17T02:34:40Z

@chinakook Thanks for your examples. It's informative. Pytorch uses a binary search for the slot in which the prob falls, while it uses linear search in MXNet. Pytorch uses the alias method as well. But I am not sure whether it works for drawing samples just once. We will optimize it using binary search at this moment.

zixuanweeei · 2019-06-17T13:48:05Z

We used binary search to optimize mx.nd.random.multinomial and it got interesting result. On Xeon Platinum 8180 platform, it costs ~0.0535s for the example @chinakook provided, while it costs ~59s before our optimization. By the way, the numpy one costs ~0.0279s.

Thanks for @chinakook 's examples. The optimization requires further verification. And the PR is on the way:tada:.

zixuanweeei · 2019-06-25T01:18:55Z

@chinakook Thanks for the issue. You can close it if there are no problems. BTW, considering that

... the binomial distribution describes the probability of k successes in n draws with replacement.

from this, I think np.random.multinomial also draws samples with replacement. Sorry for the mistake in my reply.

Any insights of this?

chinakook · 2019-06-25T01:32:43Z

No problem.

pengzhao-intel added the Performance label Jun 13, 2019

zixuanweeei mentioned this issue Jun 21, 2019

Efficient MXNet sampling in the multinomial distribution #15311

Merged

3 tasks

chinakook closed this as completed Jun 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mxnet.nd.random.multinomial is very slow on CPU and stucked on GPU #15231

mxnet.nd.random.multinomial is very slow on CPU and stucked on GPU #15231

chinakook commented Jun 13, 2019 •

edited

Loading

mxnet-label-bot commented Jun 13, 2019

pengzhao-intel commented Jun 13, 2019

zixuanweeei commented Jun 16, 2019

chinakook commented Jun 17, 2019

chinakook commented Jun 17, 2019

zixuanweeei commented Jun 17, 2019 •

edited

Loading

zixuanweeei commented Jun 17, 2019

zixuanweeei commented Jun 25, 2019

chinakook commented Jun 25, 2019

mxnet.nd.random.multinomial is very slow on CPU and stucked on GPU #15231

mxnet.nd.random.multinomial is very slow on CPU and stucked on GPU #15231

Comments

chinakook commented Jun 13, 2019 • edited Loading

mxnet-label-bot commented Jun 13, 2019

pengzhao-intel commented Jun 13, 2019

zixuanweeei commented Jun 16, 2019

chinakook commented Jun 17, 2019

chinakook commented Jun 17, 2019

zixuanweeei commented Jun 17, 2019 • edited Loading

zixuanweeei commented Jun 17, 2019

zixuanweeei commented Jun 25, 2019

chinakook commented Jun 25, 2019

chinakook commented Jun 13, 2019 •

edited

Loading

zixuanweeei commented Jun 17, 2019 •

edited

Loading