replace hash tables by linked list (or some other?) #35

rgerhards · 2016-04-06T06:37:52Z

I think that we may get overall better performance if we replace the hash tables by linked lists. Performance profiler (callgrind/kcachegrind) data suggests that the hash functions require a lot of computation time, and that access is not frequent enough to make up for this. We need to try an alternative implementation and compare the performance of the two.

It might make sense to keep both ways inside libfastjson, and permit the caller to specify which one should be used (e.g. linked lists if few fields per object are expected and hash table if many are).

In any case, the API should not assume that internally a hash table is used (there currently even is a function which returns a hash table).

davidelang · 2016-04-06T07:02:37Z

On Tue, 5 Apr 2016, Rainer Gerhards wrote:

I think that we may get overall better performance if we replace the hash tables by linked lists. Performance profiler (callgrind/kcachegrind) data suggests that the hash functions require a lot of computation time, and that access is not frequent enough to make up for this. We need to try an alternative implementation and compare the performance of the two.

It might make sense to keep both ways inside libfastjson, and permit the caller to specify which one should be used (e.g. linked lists if few fields per object are expected and hash table if many are).

In any case, the API should not assume that internally a hash table is used (there currently even is a function which returns a hash table).

I agree that for log related data it is extremely unlikely that a hash table
will be faster than a linked list.

On modern CPUs, the cache penalty for the more random access would make a
difference, even if you ignored the cost of calculating the hash. And there just
aren't going to be that many items at any one level.

The most extreme I can think of is dyn_stats data which could countain hundreds
of items, and they are either dumped in an unknown order or used in a foreach
loop. In terms of data in a log, a few tens of items would be a huge set.

David Lang

rgerhards · 2016-04-06T07:10:57Z

large hash tables also go along with a lot of stress on the malloc subsystem in the current implementation (but of course linked lists also need to grow, but we can use linked tables)..

davidelang · 2016-04-06T07:17:10Z

On Wed, 6 Apr 2016, Rainer Gerhards wrote: large hash tables also go along with a lot of stress on the malloc subsystem in the current implementation (but of course linked lists also need to grow, but we can use linked tables)..

It's also worth noting that in the case of logs, 99+% of the data that will be in the variables is a literal substring of the log input, so the content could be pointer+length into the string the vast majority of the time, so all you have to allocate is space for the field names and pointer structures, and that can be done in sane size chunks most of the time. lots of options and all good reasons for why the exact implementation needs to be an internal thing, not exposed to callers. David Lang

This basically implements the functionality and the testbench passes. We still need more tests and expect regressions. see also rsyslog#35

janmejay · 2016-04-12T15:23:16Z

I think arrays may make more sense than linked-lists. Also, hash-tables with simpler hash-fn may be very effective.

Sorted binary-searchable array may also be worth thinking about. But arrays in my opinion will beat everything else.

The absolute-best performance should come from inline keys (atleast intuitively). I mean something like this:
struct foo { int ptr_key; union { char arr[110]; char *ptr; } key; void * val; };
If var-name is smaller than 109 bytes(without terminator), it'll use var_name array, else it'll use a pointer to it(this fits in 2 cache lines if 64 byte aligned, else 3 cache lines). We can make it __attribute__((packed, aligned(64))) and drop down to 51 bytes (with ptr_key : 1) to fit it in single cache line. Intuition says it'll perform best. But its hard to say in absence of a good quality micro-benchmark.

In general CPU is in abundance in any setup. Lock contention and IO etc are generally problem and most workloads and installations would have significant CPU head-room though.

rgerhards · 2016-04-12T15:41:56Z

Thanks for the feedback. A lot of work is already completed, see https://github.com/rgerhards/libfastjson/tree/exp-no-linkhash

You are right with the inline keys, but we need not to go overboard. We create very many json objects, so setting aside so much space for inline keys really makes a difference. I'll go with a first try at 7-char keys, which means no space at all is used in those cases (7chars+NUL == sizeof(ptr)).

The main performance concern is malloc/free calls. They clearly dominiate (>50% of execution time) my small benchmarks. So the main goal is to reduce them. But nothing has yet been tested on a real use case.

rgerhards · 2016-04-12T15:42:58Z

I should also mention that in the rsyslog profiler runs I have seen, malloc/free -induced by json lib- is also the prime concern.

davidelang · 2016-04-12T15:44:24Z

I think we should be aiming to fit in one cache line, 51 characters for a name seems pleanty long (unless the name is the full path) But the problem with arrays is always resizing them. If these arrays are allocated for each collection of objects, there can be a lot of them, and while I would guess that the average size of a given subtree is going to be short, there will be cases where it's rather long. I agree that locking and memory access patterns are probably far more critical than cpu cycles on current CPUs. But this just shows that the implementation needs to be hidden from users :-) David Lang

davidelang · 2016-04-12T15:52:04Z

On Tue, 12 Apr 2016, Rainer Gerhards wrote:

Thanks for the feedback. A lot of work is already completed, see
https://github.com/rgerhards/libfastjson/tree/exp-no-linkhash

You are right with the inline keys, but we need not to go overboard. We create
very many json objects, so setting aside so much space for inline keys really
makes a difference. I'll go with a first try at 7-char keys, which means no
space at all is used in those cases (7chars+NUL == sizeof(ptr)).

The main performance concern is malloc/free calls. They clearly dominiate
(>50% of execution time) my small benchmarks. So the main goal is to reduce
them. But nothing has yet been tested on a real use case.

the classic answer to this is to have each node in the linked list that's
allcoated contain multiple items.

or going further, allocate substantially larger chunks of ram with the malloc
and then do the allocation within them ourselves.

I suspect that 7 character names will catch a good percentage of things, but
that going slightly larger will make a significant difference (if the next
reasonable step is 15 characters, I expect that will get us very close to
complete coverage)

we should see if we can do a debugging/stats option that will set counters for
variable lengths and object counts at a layer and dump them out along with
pstats. now that we have dyn_stats available, this is much less work to do than
it was before :-)

David Lang

janmejay · 2016-04-12T15:53:54Z

Well, large arrays are asymptotically just as cheap O(1).

rgerhards · 2016-04-12T15:56:54Z

The new code I am working on is a compromise: it is a liked list, but list elements are arrays (pages). The first page is kept within the json object itself, so we do need only a single alloc if all fits in one page. So we do not resize the array, but rather add a new page when it becomes full. The downside is that lookup performance is not good (but it wasn't good with the hash table approach in json-c either).

For the access patterns I have seen, this should bring notable improvement. It will probably become problematic if routinely more than 100 subobjects exists in single objects.

Note in regard to rsyslog: path components are objects, so if you have !a!b!c
you have obj a, which contains subobj b;
obj b, which contains subobj c;
object c

janmejay · 2016-04-12T15:58:17Z

I run it with jemalloc, it completely kills malloc cpu footprint. It does exactly that (allocated memory pooling and thread-local allocation buffers).

janmejay · 2016-04-12T15:59:55Z

100 is definitely on the higher side. I think most logs will be less than 20 fields.

rgerhards · 2016-04-12T16:00:08Z

I know that jemalloc helps a lot, but we've seen to many cases where jemalloc failed. We used it as default for a couple of month before we gave up. It's something the (educated) user should tune.

rgerhards · 2016-04-12T16:01:25Z

@janmejay 20 is also my PoV, often less. Current page size is 8, which again is a time/space tradeoff. BTW: I have just commited a state that should be fully working, but does not yet contain all enhancements (e.g. inlined keys).

janmejay · 2016-04-12T16:03:25Z

@rgerhards slight digression, what code-paths used to fail with jemalloc?

rgerhards · 2016-04-12T16:11:40Z

I don't remember exactly, but many. We had constant trouble. I guess most, if not all, could have solved by using the newest version, but we do not like to maintain jemalloc packages for obvious reasons. It's not that it crashed every hour, but at least once every two weeks we had reports that pointed into jemalloc and often we also found these were bugs fixed in newer versions.

janmejay · 2016-04-12T16:20:35Z

FWIW, my cluster uses imptcp with stream-compression, ruleset with several threads, linked-list based queues, omkafka, omprog etc quite heavily and haven't seen any failures due to jemalloc. May be things have improved since. Alll this is on Debian Wheezy.

May be we should try again (do a time-boxed effort)?

rgerhards · 2016-04-13T08:20:53Z

I think we should move jemalloc off this tracker ;)

this now needs to prove in praxis see also rsyslog#35

rgerhards · 2016-04-28T15:54:04Z

This is now done in 0.99.3. We will see how this actually affects performance. I close this bug tracker. If issue come up, we will either re-open this one or create a new one. It is suggested that we keep the array implementation in any case, maybe as an additional option if we see cases where the hash tables would make more sense.

davidelang · 2016-04-28T22:42:44Z

I compiled git 5b4b2d0 and it doesn't make a noticable difference in my config David Lang

…

On Thu, 28 Apr 2016, Rainer Gerhards wrote: This is now done in 0.99.3. We will see how this actually affects performance. I close this bug tracker. If issue come up, we will either re-open this one or create a new one. It is suggested that we keep the array implementation in any case, maybe as an additional option if we see cases where the hash tables would make more sense.

rgerhards · 2016-04-29T05:51:02Z

Thanks for the feedback. I think we should try to get some profiler data, this would be very interesting. Possibly we should coordinate this via private mail and report only the results, so that we can keep this tracker here tidy.

rgerhards added this to the 1.0.0 milestone Apr 6, 2016

rgerhards mentioned this issue Apr 6, 2016

Please use a consistent name space for exported symbols #11

Closed

rgerhards mentioned this issue Apr 7, 2016

remove "foreach" loop API #46

Closed

rgerhards added the enhancement label Apr 9, 2016

rgerhards added a commit to rgerhards/libfastjson that referenced this issue Apr 11, 2016

PoC: replace hash tables by arrays

239e5a4

This basically implements the functionality and the testbench passes. We still need more tests and expect regressions. see also rsyslog#35

rgerhards added a commit to rgerhards/libfastjson that referenced this issue Apr 11, 2016

PoC: replace hash tables by arrays

d4b3a2d

This basically implements the functionality and the testbench passes. We still need more tests and expect regressions. see also rsyslog#35

rgerhards added a commit to rgerhards/libfastjson that referenced this issue Apr 22, 2016

remove no longer need hashtable code

2be3551

this now needs to prove in praxis see also rsyslog#35

rgerhards mentioned this issue Apr 22, 2016

remove no longer need hashtable code #88

Merged

rgerhards added a commit to rgerhards/libfastjson that referenced this issue Apr 22, 2016

remove no longer need hashtable code

77ee8be

this now needs to prove in praxis see also rsyslog#35

rgerhards self-assigned this Apr 28, 2016

rgerhards closed this as completed Apr 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace hash tables by linked list (or some other?) #35

replace hash tables by linked list (or some other?) #35

rgerhards commented Apr 6, 2016

davidelang commented Apr 6, 2016

rgerhards commented Apr 6, 2016

davidelang commented Apr 6, 2016 via email

janmejay commented Apr 12, 2016

rgerhards commented Apr 12, 2016

rgerhards commented Apr 12, 2016

davidelang commented Apr 12, 2016 via email

davidelang commented Apr 12, 2016

janmejay commented Apr 12, 2016

rgerhards commented Apr 12, 2016

janmejay commented Apr 12, 2016

janmejay commented Apr 12, 2016

rgerhards commented Apr 12, 2016

rgerhards commented Apr 12, 2016

janmejay commented Apr 12, 2016

rgerhards commented Apr 12, 2016

janmejay commented Apr 12, 2016

rgerhards commented Apr 13, 2016

rgerhards commented Apr 28, 2016

davidelang commented Apr 28, 2016 via email

rgerhards commented Apr 29, 2016

replace hash tables by linked list (or some other?) #35

replace hash tables by linked list (or some other?) #35

Comments

rgerhards commented Apr 6, 2016

davidelang commented Apr 6, 2016

rgerhards commented Apr 6, 2016

davidelang commented Apr 6, 2016 via email

janmejay commented Apr 12, 2016

rgerhards commented Apr 12, 2016

rgerhards commented Apr 12, 2016

davidelang commented Apr 12, 2016 via email

davidelang commented Apr 12, 2016

janmejay commented Apr 12, 2016

rgerhards commented Apr 12, 2016

janmejay commented Apr 12, 2016

janmejay commented Apr 12, 2016

rgerhards commented Apr 12, 2016

rgerhards commented Apr 12, 2016

janmejay commented Apr 12, 2016

rgerhards commented Apr 12, 2016

janmejay commented Apr 12, 2016

rgerhards commented Apr 13, 2016

rgerhards commented Apr 28, 2016

davidelang commented Apr 28, 2016 via email

rgerhards commented Apr 29, 2016