Fixes #117, HasA speedup #118

barakmich · 2014-08-13T18:13:45Z

Sometimes bugs are just happy accidents. Debugging #117 I found an inner loop speedup; use the TripleDirection optimization (it's why it's there) in HasA.Next().

So I fixed the hasher contention for LevelDB and Mongo (for a small overhead, mostly on iterator creation), and, with the HasA fix, we get a really nice speedup:

$ benchcmp before117ldb.txt after117ldb.txt 
benchmark                                   old ns/op        new ns/op        delta
BenchmarkNamePredicate                      3295932          3183968          -3.40%
BenchmarkLargeSetsNoIntersection            431508628        228258874        -47.10%
BenchmarkVeryLargeSetsSmallIntersection     1206952805       1092685245       -9.47%
BenchmarkHelplessContainsChecker            240092677579     125736583104     -47.63%
BenchmarkNetAndSpeed                        65325504         54174630         -17.07%
BenchmarkKeanuAndNet                        39650897         38110738         -3.88%
BenchmarkKeanuAndSpeed                      52811498         48721469         -7.74%
BenchmarkKeanuOther                         439261737        224065826        -48.99%
BenchmarkKeanuBullockOther                  399041649        216219529        -45.82%

Nearly 2x for free on heftier queries.

Interestingly, doing this affects the memstore. I'm not sure why that is; it shouldn't, because the implementation of TripleDirection() for memstore is precisely the same as the lines it replaced. Something weird with Go and the stack?

I dug into the profiling a little bit, but for the life of me couldn't find the slowdown. It's an interesting experiment if anyone wants to get digging.

$ benchcmp before117mem.txt after117mem.txt 
benchmark                                   old ns/op       new ns/op       delta
BenchmarkNamePredicate                      896946          914073          +1.91%
BenchmarkLargeSetsNoIntersection            49592914        146742733       +195.89%
BenchmarkVeryLargeSetsSmallIntersection     676757437       552107457       -18.42%
BenchmarkHelplessContainsChecker            27981450440     18296867000     -34.61%
BenchmarkNetAndSpeed                        17037712        29424429        +72.70%
BenchmarkKeanuAndNet                        12780980        16487923        +29.00%
BenchmarkKeanuAndSpeed                      15331673        27621874        +80.16%
BenchmarkKeanuOther                         45547774        98223252        +115.65%
BenchmarkKeanuBullockOther                  94835191        263263464       +177.60%

This may also be helped by not using LLRB (ie, we might getback the performace and then some). Unclear. Still, when we're talking ~100ms difference at largest, I think it's worth doing for now, and then investigating later. Especially since the benefits for every larger/persistent store, both existing and soon-to-exist, are clear wins.

/cc @kortschak @igormarfin

My C++ programmer's disease was acting up.

barakmich · 2014-08-14T01:22:56Z

Merging this; yes, sync.Pool could be used, but creating a hasher for short-term use per-Value isn't too costly, as that only happens on write and on iterator creation, and is 1.2 compatible.

If it becomes a problem (more likely in writing) then we can reconsider 1.3 and sync pool, but the HasA fix and the bug being closed are bigger wins.

Fixes #117, HasA speedup

igormarfin · 2014-08-14T09:03:26Z

Hi Barak,

I've merged the "issue117" request to test your modifications. I see that the problem with threads-safety is solved. But another problem appears:
testing this simple code,

function getsizeID()
{
var counter=0;
g.V().Has("is","ID").ForEach(
function (d) {
counter++;
}
);
return {"id":"1", "source":counter,"target":counter};
}
var num_ids = getsizeID(); g.Emit(num_ids);

@ my dataset BEFORE the merge with "issue117" commit,

gives me the following numbers for : (1) resource consuming, (2) performance, (3) a final result

(1)
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4508 root 20 0 147m 45m 5852 S 57.8 9.2 0:06.23 cayley
4510 root 20 0 147m 45m 5852 S 32.2 9.2 0:05.13 cayley
4507 root 20 0 147m 45m 5852 S 16.6 9.2 0:03.55 cayley
4509 root 20 0 147m 45m 5852 S 10.0 9.2 0:04.48 cayley
4504 root 20 0 147m 45m 5852 S 0.7 9.2 0:00.14 cayley

(2)
I0814 10:59:10.019801 04503 http.go:77] Started POST /api/v1/query/gremlin for 10.0.2.2:47010
I0814 10:59:25.766035 04503 http.go:79] Completed 200 OK /api/v1/query/gremlin in 15.746513173s
(3)
{
"result": [
{
"id": "1",
"source": 150219,
"target": 150219
}
]
}

The numbers after MERGING are
(1)
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4468 root 20 0 453m 330m 7376 S 74.1 66.5 0:04.63 cayley
4471 root 20 0 453m 330m 7376 R 19.9 66.5 0:08.61 cayley
4469 root 20 0 453m 330m 7376 R 12.6 66.5 0:04.12 cayley

(2)
I0814 10:28:52.210239 04462 http.go:77] Started POST /api/v1/query/gremlin for 10.0.2.2:46931
I0814 10:29:22.261990 04462 http.go:79] Completed 400 Bad Request /api/v1/query/gremlin in 30.052265636s

(3)

{
"error": "query timed out"
}

I don't understand why after this fix, the total RES (non-swapped physical memory ) is increased from 6_44mb to 3_330mb. Why after the fix, there are only 3 threads on average with high memory consumption instead of 5 threads with low RES, what do you think?

Also the performance is degraded from 15s to 30s (I think that last number might be even large, because the whole RES of VM was eaten by cayley at the 30th second )

Also, I have a question on the output JSON in the latter case:

{
"error": "query timed out"
}

In reality, query was executed within allowed time range ( I have used to set the configuration;
{"database": "leveldb",
"read_only": false,
"gremlin_timeout":1200,
"write_buffer_mb":600,
"cache_size_mb":1000}
)
but the query was interrupted because of RES of VM was out. I want to say that
"error": "query timed out" might be misleading, perhaps in such cases, the "error": "memory out" or something similar is more suitable.

Cheers,
Igor.

igormarfin · 2014-08-14T09:15:41Z

I've just tested the query "g.V().Has("is","ID").All(); "

before applying the fix "issue117"

(1) Memory consumption

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4503 root 20 0 297m 195m 6156 S 34.8 39.3 0:06.62 cayley
4509 root 20 0 297m 195m 6156 S 32.5 39.3 0:11.71 cayley
4510 root 20 0 297m 195m 6156 S 27.5 39.3 0:12.34 cayley
4507 root 20 0 297m 195m 6156 S 16.6 39.3 0:06.78 cayley
4506 root 20 0 297m 195m 6156 S 3.3 39.3 0:00.10 cayley

(2) performance
I0814 11:07:51.366073 04503 http.go:77] Started POST /api/v1/query/gremlin for 10.0.2.2:47052
I0814 11:07:51.374058 04503 finals.go:246] (hasa 49 tags:id; direction:subject
(and 46
tags:
primary_it:
(leveldb 47 tags: [] dir: object size:150283 ID)
other_its:
0:
(leveldb 48 tags: [] dir: predicate size:751341 is)
))
I0814 11:08:00.745239 04503 http.go:79] Completed 200 OK /api/v1/query/gremlin in 9.3791569s

after applying the fix
(1)
4534 root 20 0 549m 370m 7468 R 52.5 74.7 0:04.72 cayley
4533 root 20 0 549m 370m 7468 R 50.5 74.7 0:03.93 cayley
4535 root 20 0 549m 370m 7468 S 41.6 74.7 0:03.94 cayley
4527 root 20 0 549m 370m 7468 S 10.0 74.7 0:03.61 cayley
4528 root 20 0 549m 370m 7468 S 0.7 74.7 0:00.10 cayley

(2)
I0814 11:09:43.700662 04527 http.go:77] Started POST /api/v1/query/gremlin for 10.0.2.2:47057
I0814 11:09:43.874761 04527 finals.go:253] (hasa 15 tags:id; direction:subject
(and 12
tags:
primary_it:
(leveldb 13 tags: [] dir: object size:150283 ID)
other_its:
0:
(leveldb 14 tags: [] dir: predicate size:751341 is)
))
Killed

kortschak · 2014-08-14T09:35:33Z

Are you able to use go1.3 on your hosting service?

If you are, would you try this branch?

igormarfin · 2014-08-14T09:39:37Z

Could you reopen issue #117? I think it is not solved properly. Could try to test
sync.Pool? Also it might be that "hasher := qs.makeHasher()" is a bottleneck concerning the resource consumption.

igormarfin · 2014-08-14T09:41:45Z

I'm running
go version go1.3 linux/amd64

Thanks, i'll try your branch with sync.Pool.

igormarfin · 2014-08-14T10:44:58Z

Hi Dan,

Thanks for your modifications!
It works much better then PR "issue117".
Doing the same "g.V().Has("is","ID").All();" on my dataset, I got

(1)
4827 root 20 0 214m 106m 6192 R 49.2 21.4 0:04.46 cayley
4824 root 20 0 214m 106m 6192 S 40.6 21.4 0:05.22 cayley
4822 root 20 0 214m 106m 6192 R 38.2 21.4 0:04.23 cayley
4818 root 20 0 214m 106m 6192 S 14.3 21.4 0:05.75 cayley
4820 root 20 0 214m 106m 6192 S 10.3 21.4 0:00.31 cayley
4819 root 20 0 214m 106m 6192 S 0.7 21.4 0:00.21 cayley

(2)
I0814 12:35:54.420504 04818 http.go:77] Started POST /api/v1/query/gremlin for 10.0.2.2:47715
I0814 12:35:54.477058 04818 finals.go:253] (hasa 32 tags:id; direction:subject
(and 29
tags:
primary_it:
(leveldb 30 tags: [] dir: object size:150283 ID)
other_its:
0:
(leveldb 31 tags: [] dir: predicate size:751341 is)
))
I0814 12:36:01.681852 04818 http.go:79] Completed 200 OK /api/v1/query/gremlin in 7.261346785s

So, comparing these numbers with ones posted previously, I conclude that

your "pool" branch PR "issue117"
RES ~ 106 mB per thread RES ~ 370mB per thread
Completed in 7.261346785s killed

in this particular case of g.V().Has("is","ID").All();

What concerns the parallel running, it works nice:
I0814 12:43:01.403907 04818 http.go:77] Started POST /api/v1/query/gremlin for 10.0.2.2:47722
I0814 12:43:04.371602 04818 http.go:77] Started POST /api/v1/query/gremlin for 10.0.2.2:47723
I0814 12:43:09.038586 04818 http.go:79] Completed 200 OK /api/v1/query/gremlin in 7.634678735s
I0814 12:43:12.333065 04818 http.go:79] Completed 200 OK /api/v1/query/gremlin in 7.961455717s

So, perhaps this patch should be included in the next-round PR.

Cheers,
Igor.

igormarfin · 2014-08-14T10:46:40Z

to clarify the previous post:

your "pool" branch --------------- PR "issue117"
RES ~ 106 mB per thread ----------------- RES ~ 370mB per thread
Completed in 7.261346785s ------------------- killed

kortschak · 2014-08-15T00:29:59Z

@barakmich I'll clean up that pool branch an send a PR. It was just a rush job to see if it was the cause of the problem. Since we are now at 1.3, as you said it is now free for this change.

barakmich · 2014-08-15T00:32:12Z

I was just doing the same. I ported it to LevelDB (wholesale to the way you did it, but if you've got some cleanup I'll just drop it) and got the following numbers under -benchmem:

$ benchcmp masterpool.txt poolpool.txt
benchmark                                   old ns/op        new ns/op        delta
BenchmarkNamePredicate                      24904875         4055443          -83.72%
BenchmarkLargeSetsNoIntersection            1371473684       281083731        -79.50%
BenchmarkVeryLargeSetsSmallIntersection     2380953142       1192913454       -49.90%
BenchmarkHelplessContainsChecker            155785265351     159293895951     +2.25%
BenchmarkNetAndSpeed                        58240274         57390746         -1.46%
BenchmarkKeanuAndNet                        41265326         41565612         +0.73%
BenchmarkKeanuAndSpeed                      50923293         53374344         +4.81%
BenchmarkKeanuOther                         241302323        249436480        +3.37%
BenchmarkKeanuBullockOther                  271257793        235776385        -13.08%

benchmark                                   old allocs     new allocs     delta
BenchmarkNamePredicate                      4603           1750           -61.98%
BenchmarkLargeSetsNoIntersection            281261         96375          -65.73%
BenchmarkVeryLargeSetsSmallIntersection     769462         623590         -18.96%
BenchmarkHelplessContainsChecker            31398535       30859546       -1.72%
BenchmarkNetAndSpeed                        32041          32056          +0.05%
BenchmarkKeanuAndNet                        24301          24275          -0.11%
BenchmarkKeanuAndSpeed                      27233          27219          -0.05%
BenchmarkKeanuOther                         74656          74635          -0.03%
BenchmarkKeanuBullockOther                  109762         109745         -0.02%

benchmark                                   old bytes      new bytes      delta
BenchmarkNamePredicate                      334806         119698         -64.25%
BenchmarkLargeSetsNoIntersection            22844016       5463520        -76.08%
BenchmarkVeryLargeSetsSmallIntersection     42666256       28765880       -32.58%
BenchmarkHelplessContainsChecker            1798577824     1756408608     -2.34%
BenchmarkNetAndSpeed                        2291476        2312832        +0.93%
BenchmarkKeanuAndNet                        1761306        1744635        -0.95%
BenchmarkKeanuAndSpeed                      1910514        1904703        -0.30%
BenchmarkKeanuOther                         4528294        4518384        -0.22%
BenchmarkKeanuBullockOther                  6789563        6792568        +0.04%

So yeah, let's go for it, especially since LevelDB itself now needs sync.Pool.

kortschak · 2014-08-15T00:35:36Z

If you are already there, you can do it. I won't be able to get that done for the next couple of days.

barakmich · 2014-08-15T00:36:36Z

When you say "cleanup" -- was there any renaming or such you wanted to do? Package variables make sense.

kortschak · 2014-08-15T00:41:37Z

No, it's just that I was sick when I was writing it and I had other things to do as well, so I can't guarantee the quality of the code. I just wanted to have a look to make sure there weren't things that I had missed.

barakmich · 2014-08-15T00:41:58Z

I'll PR it momentarily

barakmich added 3 commits August 13, 2014 13:08

fix leveldb (and speed up hasa)

104e7d1

fix mongo hasher

03798bc

Go style

cfca7db

My C++ programmer's disease was acting up.

barakmich added a commit that referenced this pull request Aug 14, 2014

Merge pull request #118 from barakmich/issue117

11cfc3d

Fixes #117, HasA speedup

barakmich merged commit 11cfc3d into cayleygraph:master Aug 14, 2014

igormarfin mentioned this pull request Aug 14, 2014

Problem with Multithreading in cayley #117

Closed

barakmich deleted the issue117 branch February 8, 2015 22:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #117, HasA speedup #118

Fixes #117, HasA speedup #118

barakmich commented Aug 13, 2014

barakmich commented Aug 14, 2014

igormarfin commented Aug 14, 2014

igormarfin commented Aug 14, 2014

kortschak commented Aug 14, 2014

igormarfin commented Aug 14, 2014

igormarfin commented Aug 14, 2014

igormarfin commented Aug 14, 2014

igormarfin commented Aug 14, 2014

kortschak commented Aug 15, 2014

barakmich commented Aug 15, 2014

kortschak commented Aug 15, 2014

barakmich commented Aug 15, 2014

kortschak commented Aug 15, 2014

barakmich commented Aug 15, 2014

Fixes #117, HasA speedup #118

Fixes #117, HasA speedup #118

Conversation

barakmich commented Aug 13, 2014

barakmich commented Aug 14, 2014

igormarfin commented Aug 14, 2014

igormarfin commented Aug 14, 2014

kortschak commented Aug 14, 2014

igormarfin commented Aug 14, 2014

igormarfin commented Aug 14, 2014

igormarfin commented Aug 14, 2014

igormarfin commented Aug 14, 2014

kortschak commented Aug 15, 2014

barakmich commented Aug 15, 2014

kortschak commented Aug 15, 2014

barakmich commented Aug 15, 2014

kortschak commented Aug 15, 2014

barakmich commented Aug 15, 2014