Why is the performance so terrible? #11

FreeApophis · 2021-05-23T18:14:36Z

The tor-v3 vanity generator on cathugger/mkp224o has no GPU support and is faster on my 10 year old CPU than the numbers shown in the README here.

I get about 1 vanity hash every 10 seconds for 5 character prefix on my CPU.

I did not test if the numbers are true, cause I do not have Nvidia GPU, but this should be A LOT faster on the GPU than this.

Anyone tested the both implementation side by side? Is tor-v3-vanity really that slow? That does not sound right.

With scallion it was easily possible to have 8 and 9 character prefixes.

marcialvieira · 2021-06-08T18:46:02Z

Testing using mkp224o on an i7-8565U, even with the best optimization for me (--enable-binsearch --enable-amd64-64-24k --enable-intfilter=64), I'm only getting ~15MK/sec, and running the tor-v3-vanity with a GTX-1660, I'm getting ~5GK/sec.

However I noticed that only 1 core is busy, maybe if the validations were forwarded and validated in multi-thread, I would get more performance, taking better advantage of the keys generated by the GPU.

marcialvieira · 2021-06-09T13:41:17Z

@FreeApophis is correct, the output information is confusing, it wasn't 5GK/sec, the output was showing me a cumulative count, so the correct count is 297KK/sec.

BTW: I'm getting 368KK/s with the mkp224o on a raspberry pi 2. :O

FreeApophis · 2021-06-09T15:05:06Z

Thanks for the numbers, so there is defintily something wrong with the implementation when a Raspberry is faster than a GTX-1660.

23cku0r · 2021-08-12T19:48:56Z

4x2080Ti

4x3090

marcialvieira · 2021-08-13T16:34:24Z

As you can see @23cku0r posted, his benchmark is 8x my raspberry pi 2 performance, so just 2 rasps CPU-based have the equivalent performance of a 2080Ti GPU-based performance. lol

megapro17 · 2021-09-27T15:01:16Z

Languages
Rust
100.0%

dr-bonez · 2021-11-04T05:10:02Z

I took a look at the code again, and I don't see an obvious reason why it should be so much slower. This was a weekend pet project I threw together a while back just to try out the nvptx target for rust. I have too much going on right now to look into this, but if anyone takes the time to instrument the code and determine where the bottleneck is, I'm happy to address the problem.

dr-bonez · 2021-11-04T05:13:23Z

My best guess is that there's an issue with automatic block size detection. 256 threads with 272 blocks seems low for a 2080ti.

ghost · 2021-11-18T16:51:16Z

Something is definitely wrong here, this is my experience running it on my gtx 1080

=27116== NVPROF is profiling process 27116, command: ./t3v -d keys hello
Launching kernel on device #0 with 256 threads and 60 blocks
Tried 2012160 / 33554432 (expected) keys.
Running for 30 seconds / 8 minutes, 21 seconds (expected).
Tried 4024320 / 33554432 (expected) keys.
Running for 1 minutes, 0 seconds / 8 minutes, 21 seconds (expected).
Tried 6036480 / 33554432 (expected) keys.
Running for 1 minutes, 30 seconds / 8 minutes, 21 seconds (expected).
^C==27116== Profiling application: ./t3v -d keys hello
==27116== Warning: 1 records have invalid timestamps due to insufficient device buffer space. You can configure the buffer space using the option --device-buffer-size.
==27116== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:  100.00%  90.9431s       397  229.08ms  213.42ms  253.10ms  render
                    0.00%  591.57us       794     745ns     351ns  3.0080us  [CUDA memcpy DtoH]
                    0.00%  226.64us       404     561ns     480ns  1.2480us  [CUDA memcpy HtoD]
      API calls:   99.87%  90.9558s       397  229.11ms  213.43ms  253.10ms  cuStreamSynchronize
                    0.11%  100.82ms         1  100.82ms  100.82ms  100.82ms  cuCtxCreate
                    0.01%  10.058ms       794  12.667us  5.8100us  93.444us  cuMemcpyDtoH
                    0.01%  5.0190ms       398  12.610us  9.2540us  72.085us  cuLaunchKernel
                    0.00%  1.7040ms       404  4.2170us  2.4680us  155.27us  cuMemcpyHtoD
                    0.00%  1.6310ms         1  1.6310ms  1.6310ms  1.6310ms  cuModuleLoadData
                    0.00%  255.92us       399     641ns     280ns  1.9780us  cuModuleGetFunction
                    0.00%  109.00us         6  18.166us  1.7130us  99.015us  cuMemAlloc
                    0.00%  9.9490us         1  9.9490us  9.9490us  9.9490us  cuStreamCreateWithPriority
                    0.00%  4.9050us         1  4.9050us  4.9050us  4.9050us  cuDeviceGetPCIBusId
                    0.00%  1.7310us         6     288ns     139ns     553ns  cuDeviceGetAttribute
                    0.00%     832ns         3     277ns     107ns     554ns  cuDeviceGetCount
                    0.00%     555ns         2     277ns     101ns     454ns  cuFuncGetAttribute
                    0.00%     500ns         2     250ns      98ns     402ns  cuDeviceGet

scramblr · 2021-12-29T13:15:27Z

Same issues here. Figured I was just having bad luck, but no.. running on an 8 GPU server produces less result than multi processor mkp224o. Was really looking forward to this too, as it's the ONLY solution currently in existence for v3 onions.

dr-bonez added the help wanted Extra attention is needed label Nov 4, 2021

jkennedyvz mentioned this issue Mar 19, 2022

Build error under WSL2 #13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is the performance so terrible? #11

Why is the performance so terrible? #11

FreeApophis commented May 23, 2021 •

edited

Loading

marcialvieira commented Jun 8, 2021 •

edited

Loading

marcialvieira commented Jun 9, 2021 •

edited

Loading

FreeApophis commented Jun 9, 2021

23cku0r commented Aug 12, 2021 •

edited

Loading

marcialvieira commented Aug 13, 2021

megapro17 commented Sep 27, 2021

dr-bonez commented Nov 4, 2021

dr-bonez commented Nov 4, 2021

ghost commented Nov 18, 2021

scramblr commented Dec 29, 2021

Why is the performance so terrible? #11

Why is the performance so terrible? #11

Comments

FreeApophis commented May 23, 2021 • edited Loading

marcialvieira commented Jun 8, 2021 • edited Loading

marcialvieira commented Jun 9, 2021 • edited Loading

FreeApophis commented Jun 9, 2021

23cku0r commented Aug 12, 2021 • edited Loading

marcialvieira commented Aug 13, 2021

megapro17 commented Sep 27, 2021

dr-bonez commented Nov 4, 2021

dr-bonez commented Nov 4, 2021

ghost commented Nov 18, 2021

scramblr commented Dec 29, 2021

FreeApophis commented May 23, 2021 •

edited

Loading

marcialvieira commented Jun 8, 2021 •

edited

Loading

marcialvieira commented Jun 9, 2021 •

edited

Loading

23cku0r commented Aug 12, 2021 •

edited

Loading